Patch to fix FK-related selectivity estimates with constants

Started by Tom Laneover 5 years ago8 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Over in the thread at [1]/messages/by-id/AM6PR02MB5287A0ADD936C1FA80973E72AB190@AM6PR02MB5287.eurprd02.prod.outlook.com it's discussed how our code for making
selectivity estimates using knowledge about FOREIGN KEY constraints
is busted in the face of EquivalenceClasses including constants.

That is, if fktab(a,b) is a 2-column FK reference to pktab(a,b)
and we have a query like

... where fktab.a = pktab.a and fktab.b = pktab.b

then we understand that any given fktab row can match at most one
pktab row (and this estimate is often a lot better than we'd get
from assuming that the a and b conditions are independent).

However, if the query is like

... where fktab.a = pktab.a and fktab.b = pktab.b
and fktab.a = 1

then this suddenly breaks down and we go back to non-FK-aware
estimates. The reason is that get_foreign_key_join_selectivity()
is looking for join clauses that equate the two sides of the FK
constraint ... and in this example, it will not see any such
join clause for column "a". That's because equivclass.c decided
to replace the given clauses with "fktab.a = 1 and pktab.a = 1",
which can be enforced at the scan level, leaving nothing to be
done for column "a" at the join level.

We can fix that by detecting which EquivalenceClasses are marked
"ec_has_const", since that's the property that dictates whether
equivclass.c uses this strategy. However, that's only a partial
fix; if you try it, you soon find that the selectivity estimates
are still off. The reason is that because the two "a = 1" conditions
are already factored into the size estimates for the join input
relations, we're essentially double-counting the "fktab.a = 1"
condition's selectivity if we use the existing FK selectivity
estimation rule. If we treated the constant condition as only
relevant to the PK side, then the FK selectivity rule could work
normally. But we don't want to drop the ability to enforce the
restriction at the scan level. So what we have to do is cancel
the FK side's condition's selectivity out of the FK selectivity.

Attached is a patch series that attacks it that way. For ease of
review I split it into two steps:

0001 refactors process_implied_equality() so that it can pass
back the new RestrictInfo to its callers in equivclass.c.
I think this is a good change on its own merits, because it means
that when generating a derived equality, we don't have to use
initialize_mergeclause_eclasses() to set up the new RestrictInfo's
left_ec and right_ec pointers. The equivclass.c caller knows
perfectly darn well which EquivalenceClass the two sides of the
clause belong to, so it can just assign that value, saving a couple
of potentially-not-cheap get_eclass_for_sort_expr() searches.
This does require process_implied_equality() to duplicate some of
the steps in distribute_qual_to_rels(), but on the other hand we
get to remove some complexity from distribute_qual_to_rels() because
it no longer has to deal with any is_deduced cases. Anyway, the
end goal of this step is that we can save away all the generated
"x = const" clauses in the EC's ec_derives list. 0001 doesn't
do anything with that information, but ...

0002 actually fixes the bug. Dealing with the first part of the
problem just requires counting how many of the ECs we matched to
an FK constraint are ec_has_const. To deal with the second part,
we dig out the scan-level "x = const" clause that the EC generated
for the FK column and see what selectivity it has got. This beats
other ways of reconstructing the scan-clause selectivity because
(at least in all normal cases) that selectivity would have been
cached in the RestrictInfo. Thus we not only save cycles but can be
sure we are cancelling out exactly the right amount of selectivity.

I would not propose back-patching this, but it seems OK for HEAD.
Thoughts?

regards, tom lane

[1]: /messages/by-id/AM6PR02MB5287A0ADD936C1FA80973E72AB190@AM6PR02MB5287.eurprd02.prod.outlook.com

Attachments:

0001-refactor-process_implied_equality.patchtext/x-diff; charset=us-ascii; name=0001-refactor-process_implied_equality.patchDownload+178-89
0002-fix-fk-estimates-with-constants.patchtext/x-diff; charset=us-ascii; name=0002-fix-fk-estimates-with-constants.patchDownload+107-13
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#1)
Re: Patch to fix FK-related selectivity estimates with constants

I wrote:

Over in the thread at [1] it's discussed how our code for making
selectivity estimates using knowledge about FOREIGN KEY constraints
is busted in the face of EquivalenceClasses including constants.
...
Attached is a patch series that attacks it that way.

I'd failed to generate a test case I liked yesterday, but perhaps
the attached will do. (While the new code is exercised in the
core regression tests already, it doesn't produce any visible
plan changes.) I'm a little nervous about whether the plan
shape will be stable in the buildfarm, but it works for me on
both 64-bit and 32-bit machines, so probably it's OK.

regards, tom lane

Attachments:

0003-add-a-test-case.patchtext/x-diff; charset=us-ascii; name=0003-add-a-test-case.patchDownload+79-0
#3Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#2)
Re: Patch to fix FK-related selectivity estimates with constants

On Tue, Oct 27, 2020 at 01:58:56PM -0400, Tom Lane wrote:

I wrote:

Over in the thread at [1] it's discussed how our code for making
selectivity estimates using knowledge about FOREIGN KEY constraints
is busted in the face of EquivalenceClasses including constants.
...
Attached is a patch series that attacks it that way.

The patch sems fine to me, thanks for investigating and fixing this.

Two minor comments:

I find it a bit strange that generate_base_implied_equalities_const adds
the rinfo to ec_derives, while generate_base_implied_equalities_no_const
does not. I understand it's correct as we don't lookup the non-const
clauses, and we want to keep the list as short as possible, but it seems
like a bit surprising/unexpected difference in behavior.

I think casting the 'clause' to (Node*) in process_implied_equality is
unnecessary - it was probably needed when it was declared as Expr* but
the patch changes that.

As for the backpatching, I don't feel very strongly about it. It's
clearly a bug/thinko in the code, and I don't see any obvious risks in
backpatching it (no ABI breaks etc.). OTOH multi-column foreign keys are
not very common, and the query pattern seems rather unusual too, so the
risk is pretty low I guess. We certainly did not get many reports, so.

I'd failed to generate a test case I liked yesterday, but perhaps
the attached will do. (While the new code is exercised in the
core regression tests already, it doesn't produce any visible
plan changes.) I'm a little nervous about whether the plan
shape will be stable in the buildfarm, but it works for me on
both 64-bit and 32-bit machines, so probably it's OK.

Works fine on raspberry pi 4 (i.e. armv7l, 32-bit arm) too.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#3)
Re: Patch to fix FK-related selectivity estimates with constants

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Tue, Oct 27, 2020 at 01:58:56PM -0400, Tom Lane wrote:

Attached is a patch series that attacks it that way.

The patch sems fine to me, thanks for investigating and fixing this.

Thanks for looking at it!

I find it a bit strange that generate_base_implied_equalities_const adds
the rinfo to ec_derives, while generate_base_implied_equalities_no_const
does not. I understand it's correct as we don't lookup the non-const
clauses, and we want to keep the list as short as possible, but it seems
like a bit surprising/unexpected difference in behavior.

Yeah, perhaps. I considered replacing ec_derives with two lists, one
for base-level derived clauses and one for join-level derived clauses,
but it didn't really seem worth the trouble. This is something we
could change later if a need arises to be able to look back at non-const
base-level derived clauses.

I think casting the 'clause' to (Node*) in process_implied_equality is
unnecessary - it was probably needed when it was declared as Expr* but
the patch changes that.

Hm, thought I got rid of the unnecessary casts ... I'll look again.

As for the backpatching, I don't feel very strongly about it. It's
clearly a bug/thinko in the code, and I don't see any obvious risks in
backpatching it (no ABI breaks etc.).

I had two concerns about possible extension breakage from a back-patch:

* Changing the set of fields in ForeignKeyOptInfo is an ABI break.
We could minimize the risk by adding the new fields at the end in
the back branches, but it still wouldn't be zero-risk.

* Changing the expectation about whether process_implied_equality()
will fill left_ec/right_ec is an API break. It's somewhat doubtful
whether there exist any callers outside equivclass.c, but again it's
not zero risk.

The other issue, entirely unrelated to code hazards, is whether this
is too big a change in planner behavior to be back-patched. We've
often felt that destabilizing stable-branch plan choices is something
to be avoided.

Not to mention the whole issue of whether this patch has any bugs of
its own.

So on the whole I wouldn't want to back-patch, or at least not do so
very far. Maybe there's an argument that v13 is still new enough to
deflect the concern about plan stability.

regards, tom lane

#5Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#4)
Re: Patch to fix FK-related selectivity estimates with constants

On Tue, Oct 27, 2020 at 09:27:06PM -0400, Tom Lane wrote:

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:

On Tue, Oct 27, 2020 at 01:58:56PM -0400, Tom Lane wrote:

Attached is a patch series that attacks it that way.

The patch sems fine to me, thanks for investigating and fixing this.

Thanks for looking at it!

I find it a bit strange that generate_base_implied_equalities_const adds
the rinfo to ec_derives, while generate_base_implied_equalities_no_const
does not. I understand it's correct as we don't lookup the non-const
clauses, and we want to keep the list as short as possible, but it seems
like a bit surprising/unexpected difference in behavior.

Yeah, perhaps. I considered replacing ec_derives with two lists, one
for base-level derived clauses and one for join-level derived clauses,
but it didn't really seem worth the trouble. This is something we
could change later if a need arises to be able to look back at non-const
base-level derived clauses.

I think casting the 'clause' to (Node*) in process_implied_equality is
unnecessary - it was probably needed when it was declared as Expr* but
the patch changes that.

Hm, thought I got rid of the unnecessary casts ... I'll look again.

Apologies, the casts are fine. I got it mixed up somehow.

As for the backpatching, I don't feel very strongly about it. It's
clearly a bug/thinko in the code, and I don't see any obvious risks in
backpatching it (no ABI breaks etc.).

I had two concerns about possible extension breakage from a back-patch:

* Changing the set of fields in ForeignKeyOptInfo is an ABI break.
We could minimize the risk by adding the new fields at the end in
the back branches, but it still wouldn't be zero-risk.

* Changing the expectation about whether process_implied_equality()
will fill left_ec/right_ec is an API break. It's somewhat doubtful
whether there exist any callers outside equivclass.c, but again it's
not zero risk.

The other issue, entirely unrelated to code hazards, is whether this
is too big a change in planner behavior to be back-patched. We've
often felt that destabilizing stable-branch plan choices is something
to be avoided.

Not to mention the whole issue of whether this patch has any bugs of
its own.

So on the whole I wouldn't want to back-patch, or at least not do so
very far. Maybe there's an argument that v13 is still new enough to
deflect the concern about plan stability.

OK, understood.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#4)
Re: Patch to fix FK-related selectivity estimates with constants

On 2020-Oct-27, Tom Lane wrote:

I had two concerns about possible extension breakage from a back-patch:

* Changing the set of fields in ForeignKeyOptInfo is an ABI break.
We could minimize the risk by adding the new fields at the end in
the back branches, but it still wouldn't be zero-risk.

It'd be useful to be able to qualify this sort of risk more objectively.
I think if a struct is used as a function argument somewhere or arrays
of the struct are formed, then it's certain that changing that struct's
size is going to cause problems. In this case, at least in core code,
we only pass pointers to the struct around, not the struct itself, so
not a problem; and we only use the struct in lists, not in arrays, so
that's not a problem either.

What other aspects should we consider?

Another angle is usage of the struct by third-party code. I used
codesearch.debian.net and, apart from Postgres itself, it only found the
string in hypopg (but in typedefs.list, so not relevant) and pgpool2
(which appears to carry its own copy of nodes.h). Inconsequential.

Searching the web, I did find this:
https://docs.rs/rpgffi/0.3.3/rpgffi/struct.ForeignKeyOptInfo.html but it
appears that this project (an incomplete attempt at a framework to
create Postgres extensions in Rust) mechanically extracts every struct,
but no further use of the struct is done. I was unable to find anything
actually *using* rpgffi.

It is possible that some proprietary code is using the struct in a way
that would put it in danger, though.

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#6)
Re: Patch to fix FK-related selectivity estimates with constants

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

On 2020-Oct-27, Tom Lane wrote:

* Changing the set of fields in ForeignKeyOptInfo is an ABI break.
We could minimize the risk by adding the new fields at the end in
the back branches, but it still wouldn't be zero-risk.

It'd be useful to be able to qualify this sort of risk more objectively.

Agreed.

I think if a struct is used as a function argument somewhere or arrays
of the struct are formed, then it's certain that changing that struct's
size is going to cause problems.

I grasp the point about arrays, but not sure how it's a problem for
function arguments per se? Or were you thinking of functions that
take a struct as pass-by-value not pass-by-reference?

The way I've generally thought about this is that new fields added to the
end of a Node struct are only likely to be a hazard if extension code
creates new instances of that Node type. If it does, it's certainly
problematic, first because makeNode() will allocate the wrong amount of
storage (ABI issue) and second because the extension won't know it needs
to fill the new fields (API issue). However if we don't expect that that
will happen, then it's probably going to be OK. Code that just inspects
Nodes made by the core code won't be broken, as long as we don't change
the semantics of the existing fields. We don't ever pass Node structs by
value, and we don't make arrays of them either, so the actual size of the
struct isn't much of an ABI issue.

As you say, we can also search to see if there seem to be any extensions
using the struct in question. I don't have a huge amount of faith in
that, because I think there are lots of proprietary/custom extensions
that aren't visible on the net. But on the other hand, the users
of such extensions probably wouldn't have much trouble rebuilding them
for a new version, if they did get bit. It's the widely distributed
extensions that might have users not capable of dealing with that.

regards, tom lane

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#7)
Re: Patch to fix FK-related selectivity estimates with constants

On 2020-Oct-28, Tom Lane wrote:

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

I think if a struct is used as a function argument somewhere or arrays
of the struct are formed, then it's certain that changing that struct's
size is going to cause problems.

I grasp the point about arrays, but not sure how it's a problem for
function arguments per se? Or were you thinking of functions that
take a struct as pass-by-value not pass-by-reference?

Yeah, pass-by-value. As you say we don't do that with Node structs, but
there are some other structs that are sometimes passed by value. It's
certainly not a common problem though.

The way I've generally thought about this is that new fields added to the
end of a Node struct are only likely to be a hazard if extension code
creates new instances of that Node type. If it does, it's certainly
problematic, first because makeNode() will allocate the wrong amount of
storage (ABI issue) and second because the extension won't know it needs
to fill the new fields (API issue).

Right.

As you say, we can also search to see if there seem to be any extensions
using the struct in question. I don't have a huge amount of faith in
that, because I think there are lots of proprietary/custom extensions
that aren't visible on the net. But on the other hand, the users
of such extensions probably wouldn't have much trouble rebuilding them
for a new version, if they did get bit. It's the widely distributed
extensions that might have users not capable of dealing with that.

In practice, at 2ndQuadrant we've had trouble a couple of times with ABI
breaks -- certain situations can become crasher bugs, to which some
customers are extremely sensitive.

I've added a link to your message to the wiki here:
https://wiki.postgresql.org/wiki/Committing_checklist#Maintaining_ABI_compatibility_while_backpatching