Buglets in equivclass.c

Started by Tom Laneover 5 years ago3 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

While hacking on a patch that touches equivclass.c, I came across
a couple of things that seemed wrong, and are fixed by the attached
proposed patch.

First, get_eclass_for_sort_expr() computes expr_relids and nullable_relids
too soon. This is a waste of a not-really-trivial number of cycles in
the common cases where it finds an existing eclass or is told not to
make a new one. More subtly, the bitmapsets are computed in the caller's
context. If we do use them, they will be attached to an EquivalenceClass
that lives in the potentially-longer-lived root->planner_cxt, allowing
the EC's pointers to them to become dangling. This would be a live bug
if get_eclass_for_sort_expr() could be called with create_it = true during
GEQO planning. So far as I can find, it is not; but both its API spec and
its internal comments certainly give the impression that that's allowed.

Second, generate_join_implied_equalities() uses inner_rel->relids to
look up relevant eclasses, but given the surrounding code it seems like
it ought to be using nominal_inner_relids. The code accidentally works
because a child RelOptInfo will always have exactly the same
eclass_indexes as its top parent; but if it did not, we'd risk either
missing some relevant eclasses or hitting the assertion that claims
that all the eclasses we find overlap nominal_join_relids. (I noticed
this while speculating that maybe we needn't bother maintaining
eclass_indexes for child RelOptInfos. This code is one place that
would fail if we didn't.)

I'm unsure whether to back-patch either of these. They both seem to be
just latent bugs so far as the core code is concerned, but the first one
in particular seems like something that could bite extension code.
Thoughts?

regards, tom lane

Attachments:

fix-equivclass-buglets.patchtext/x-diff; charset=us-ascii; name=fix-equivclass-buglets.patchDownload+8-8
#2David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#1)
Re: Buglets in equivclass.c

On Mon, 5 Oct 2020 at 06:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:

While hacking on a patch that touches equivclass.c, I came across
a couple of things that seemed wrong, and are fixed by the attached
proposed patch.

First, get_eclass_for_sort_expr() computes expr_relids and nullable_relids
too soon. This is a waste of a not-really-trivial number of cycles in
the common cases where it finds an existing eclass or is told not to
make a new one. More subtly, the bitmapsets are computed in the caller's
context. If we do use them, they will be attached to an EquivalenceClass
that lives in the potentially-longer-lived root->planner_cxt, allowing
the EC's pointers to them to become dangling. This would be a live bug
if get_eclass_for_sort_expr() could be called with create_it = true during
GEQO planning. So far as I can find, it is not; but both its API spec and
its internal comments certainly give the impression that that's allowed.

hmm, yeah, it seems fairly low-risk to be moving that down to after we
switch the memory context. Seems like the sort of thing that could
easily become an actual bug one day.

Second, generate_join_implied_equalities() uses inner_rel->relids to
look up relevant eclasses, but given the surrounding code it seems like
it ought to be using nominal_inner_relids. The code accidentally works
because a child RelOptInfo will always have exactly the same
eclass_indexes as its top parent; but if it did not, we'd risk either
missing some relevant eclasses or hitting the assertion that claims
that all the eclasses we find overlap nominal_join_relids. (I noticed
this while speculating that maybe we needn't bother maintaining
eclass_indexes for child RelOptInfos. This code is one place that
would fail if we didn't.)

Oops. That certainly should be using nominal_inner_relids rather than
innerrel->relids. I agree it's not a live bug since child ECs are just
a copy of their parents, currently.

I'm unsure whether to back-patch either of these. They both seem to be
just latent bugs so far as the core code is concerned, but the first one
in particular seems like something that could bite extension code.
Thoughts?

That's a good question. I'm leaning towards backpatching both of them.
The 2nd is new as of v13, so it does not seem unreasonable that
someone has just not yet stumbled on it with some extension that adds
extra child ECs. I can imagine a use case for that, by getting rid of
needless equality quals that duplicate the partition constraint. The
fix for the first just seems neater/faster/correct, so I can't really
see any reason not to backpatch it.

David

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#2)
Re: Buglets in equivclass.c

David Rowley <dgrowleyml@gmail.com> writes:

On Mon, 5 Oct 2020 at 06:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm unsure whether to back-patch either of these. They both seem to be
just latent bugs so far as the core code is concerned, but the first one
in particular seems like something that could bite extension code.
Thoughts?

That's a good question. I'm leaning towards backpatching both of them.
The 2nd is new as of v13, so it does not seem unreasonable that
someone has just not yet stumbled on it with some extension that adds
extra child ECs. I can imagine a use case for that, by getting rid of
needless equality quals that duplicate the partition constraint. The
fix for the first just seems neater/faster/correct, so I can't really
see any reason not to backpatch it.

Yeah, that's my conclusion too after sleeping on it. Pushed,
thanks for reviewing.

regards, tom lane