BUG #16536: Segfault with partition-wise joins

Started by PG Bug reporting formalmost 6 years ago4 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 16536
Logged by: Andrew Gierth
Email address: andrew@tao11.riddles.org.uk
PostgreSQL version: 11.8
Operating system: any
Description:

Reported on 11.8 but seems to be still present in HEAD. My testcase,
generated from a report on IRC by Arne Roland.

The problem seems to be that in conditions of a bitmap index scan under a
bitmapAnd, in the case of a partitionwise join, a Var isn't being replaced
properly by a Param, resulting in a crash when we attempt to validate the
expression since no slot is available for the Var. The bitmapAnd seems to be
required to trip the bug, with a single bitmap index scan it does not
manifest.

create table ax1 (a integer, b integer, c integer) partition by range (a);
create table ax2 (a integer, b integer, c integer) partition by range (a);
create table ax1_1 partition of ax1 for values from (1) to (11);
create table ax1_2 partition of ax1 for values from (11) to (21);
create table ax1_3 partition of ax1 for values from (21) to (31);
create table ax2_1 partition of ax2 for values from (1) to (11);
create table ax2_2 partition of ax2 for values from (11) to (21);
create table ax2_3 partition of ax2 for values from (21) to (31);
create index on ax2 (b);
create index on ax2 (c);
insert into ax2 select 1 + i%30, i, i from generate_series(1,1000) i,
generate_series(1,10) j;
insert into ax1 select 1 + i%30, i, i from generate_series(1,1000) i;
vacuum analyze ax1;
vacuum analyze ax2;
set enable_partitionwise_join = on;
set enable_indexscan=false;
explain select * from ax1 where not exists (select from ax2 where
ax2.a=ax1.a and ax2.b=ax1.b and ax2.c=123) and a=1 and c=120;
select * from ax1 where not exists (select from ax2 where ax2.a=ax1.a and
ax2.b=ax1.b and ax2.c=123) and a=1 and c=120;

The explain ends up with this in it:

-> Bitmap Index Scan on ax2_1_b_idx1 (cost=0.00..4.43 rows=20
width=0)
Index Cond: (b = b)

and inspection of the node tree shows that that second "b" is in fact a Var
not a Param.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #16536: Segfault with partition-wise joins

PG Bug reporting form <noreply@postgresql.org> writes:

The problem seems to be that in conditions of a bitmap index scan under a
bitmapAnd, in the case of a partitionwise join, a Var isn't being replaced
properly by a Param, resulting in a crash when we attempt to validate the
expression since no slot is available for the Var. The bitmapAnd seems to be
required to trip the bug, with a single bitmap index scan it does not
manifest.

I believe I see the problem. create_bitmap_and_path and
create_bitmap_or_path figure that they can cheat and not
mark the path with parameterization info:

pathnode->path.param_info = NULL; /* not used in bitmap trees */

However, reparameterize_path_by_child thinks it only has to
recurse into sub-paths that have relevant parameterization:

/*
* If the path is not parameterized by parent of the given relation, it
* doesn't need reparameterization.
*/
if (!path->param_info ||
!bms_overlap(PATH_REQ_OUTER(path), child_rel->top_parent_relids))
return path;

So, if we try to reparameterize a parameterized bitmap path,
nothing happens to the children of BitmapAnd/Or sub-paths,
and we end up with a broken path leading to a busted plan.

It's a bit embarrassing that this has escaped detection.
I see from coverage.postgresql.org that we have zero coverage
for bitmap paths here, as well as several other path types,
which now one must wonder if those cases work either.
(It's not real clear to me if this code can be reached in
cases other than enable_partitionwise_join; the effects might
be pretty limited if not.)

Anyway, I can see a couple of routes to a fix:

(1) Change create_bitmap_and_path and create_bitmap_or_path
to account for parameterization honestly. This is certainly
the cleanest fix, but it would add some cycles, and what's
probably a bigger issue for back-patching is that their
signatures would have to change. Maybe that's okay?
There's probably not a reason for external code to call them,
and codesearch.debian.net knows of no instances of that.

(2) Hack up reparameterize_path_by_child so that it will descend
into these nodes regardless of their parameterization markers.
That's okay from an efficiency standpoint, since we'd already
have stopped at the parent BitmapHeapPath if it weren't
parameterized. But it seems quite ugly.

Another point I notice is that reparameterize_path thinks it
doesn't need to touch sub-structure at all when increasing
the parameterization of a BitmapHeapPath. Maybe that's okay
but it probably deserves more thought, especially since I see
that the case is again untested.

Thoughts?

regards, tom lane

#3Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Tom Lane (#2)
Re: BUG #16536: Segfault with partition-wise joins

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

Tom> Anyway, I can see a couple of routes to a fix:

Tom> (1) Change create_bitmap_and_path and create_bitmap_or_path to
Tom> account for parameterization honestly. This is certainly the
Tom> cleanest fix, but it would add some cycles, and what's probably a
Tom> bigger issue for back-patching is that their signatures would have
Tom> to change. Maybe that's okay? There's probably not a reason for
Tom> external code to call them, and codesearch.debian.net knows of no
Tom> instances of that.

Tom> (2) Hack up reparameterize_path_by_child so that it will descend
Tom> into these nodes regardless of their parameterization markers.
Tom> That's okay from an efficiency standpoint, since we'd already have
Tom> stopped at the parent BitmapHeapPath if it weren't parameterized.
Tom> But it seems quite ugly.

Well the obvious compromise fix is to do 2 in the back-branches and 1 in
head, but that may be overkill...

Tom> Another point I notice is that reparameterize_path thinks it
Tom> doesn't need to touch sub-structure at all when increasing the
Tom> parameterization of a BitmapHeapPath. Maybe that's okay but it
Tom> probably deserves more thought, especially since I see that the
Tom> case is again untested.

Hmm. I'm not sure I fully understand the implications of what's going on
there, but if new quals are effectively being moved into the path as a
result of the reparameterization, then leaving the substructure alone
would presumably mean that those new quals can only become Filter:
clauses. But presumably, if they could be usefully indexed, then we
would have already generated a parameterized path that included them?

--
Andrew (irc:RhodiumToad)

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#3)
Re: BUG #16536: Segfault with partition-wise joins

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

"Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Anyway, I can see a couple of routes to a fix:

Tom> (1) Change create_bitmap_and_path and create_bitmap_or_path to
Tom> account for parameterization honestly. This is certainly the
Tom> cleanest fix, but it would add some cycles, and what's probably a
Tom> bigger issue for back-patching is that their signatures would have
Tom> to change. Maybe that's okay? There's probably not a reason for
Tom> external code to call them, and codesearch.debian.net knows of no
Tom> instances of that.

Tom> (2) Hack up reparameterize_path_by_child so that it will descend
Tom> into these nodes regardless of their parameterization markers.
Tom> That's okay from an efficiency standpoint, since we'd already have
Tom> stopped at the parent BitmapHeapPath if it weren't parameterized.
Tom> But it seems quite ugly.

Well the obvious compromise fix is to do 2 in the back-branches and 1 in
head, but that may be overkill...

I realized there's another way, which is to make create_bitmap_and_path
and create_bitmap_or_path compute the correct parameterization as the
union of the outer rels required by their child paths. Then they don't
need any additional arguments. I had been thinking of having indxpath.c
pass down the parameterization, but it turns out that that code isn't
tracking that anyway, so it would have largely had to do it like this
anyhow.

As a further benefit, now that BitmapAndPath and BitmapOrPath
aren't special in this way, we can get rid of indxpath.c's
get_bitmap_tree_required_outer() function; it's sufficient to do
PATH_REQ_OUTER() on the top-level path node in a bitmap tree.
So the attached patch actually nets out at less code than before.

You could argue I guess that although this isn't an ABI break, it's
still an API change: now these functions are required to set the
param_info field in their output correctly. That would only break
external code if it's building such Path structures by hand rather
than using these functions. I devoutly hope nobody is ... though
I did have to slap down some code in indxpath.c which was doing
exactly that.

This lacks regression test case(s) as yet, but I think it's
commitable otherwise. I'm going to try to build some test cases
that improve the coverage report in this area...

Tom> Another point I notice is that reparameterize_path thinks it
Tom> doesn't need to touch sub-structure at all when increasing the
Tom> parameterization of a BitmapHeapPath. Maybe that's okay but it
Tom> probably deserves more thought, especially since I see that the
Tom> case is again untested.

Hmm. I'm not sure I fully understand the implications of what's going on
there, but if new quals are effectively being moved into the path as a
result of the reparameterization, then leaving the substructure alone
would presumably mean that those new quals can only become Filter:
clauses. But presumably, if they could be usefully indexed, then we
would have already generated a parameterized path that included them?

We're not actually moving any new quals into the path, we're just changing
its labeling. Given the way I've done things here, I no longer think it's
a problem. I'd worried that we might need BitmapAnd/OrPaths to have
exactly the same required_outer as their parent BitmapHeapPath, but in
the event it should be fine when they have subset required_outer values.

regards, tom lane

Attachments:

fix-bug-16536.patchtext/x-diff; charset=us-ascii; name=fix-bug-16536.patchDownload+51-106