Allowing extensions to supply operator-/function-specific info

Started by Tom Laneover 7 years ago38 messageshackers

tgl@sss.pgh.pa.us

over 7 years ago

Over in the thread at [1]/messages/by-id/CACowWR0TXXL0NfPMW2afCKzX++nHHBZLW3-BLusu_B0WjBB1=A@mail.gmail.com, we realized that PostGIS has been thrashing
around trying to fake its way to having "special index operators", ie
a way to automatically convert WHERE clauses into lossy index quals.
That's existed in a non-extensible way inside indxpath.c for twenty
years come July. Since the beginning I've thought we should provide
a way for extensions to do similar things, but it never got to the top
of the to-do queue. Now I think it's time.

One low-effort answer is to add a hook call in indxpath.c that lets
extensions manipulate the sets of index clauses extracted from a
relation's qual clauses, but I don't especially like that: it dumps
all the work onto extensions, resulting in lots of code duplication,
plus they have a badly-documented and probably moving target for what
they have to do.

Another bit of technical debt that's even older is the lack of a way
to attach selectivity estimation logic to boolean-returning functions.
So that motivates me to think that whatever we do here should be easily
extensible to allow different sorts of function- or operator-related
knowledge to be supplied by extensions. We already have oprrest,
oprjoin, and protransform hooks that allow certain kinds of knowledge
to be attached to operators and functions, but we need something a bit
more general.

What I'm envisioning therefore is that we allow an auxiliary function to
be attached to any operator or function that can provide functionality
like this, and that we set things up so that the set of tasks that
such functions can perform can be extended over time without SQL-level
changes. For example, we could say that the function takes a single
Node* argument, and that the type of Node tells it what to do, and if it
doesn't recognize the type of Node it should just return NULL indicating
"use default handling". We'd start out with two relevant Node types,
one for the selectivity-estimation case and one for the extract-a-lossy-
index-qual case, and we could add more over time.

What we can do to attach such a support function to a target function
is to repurpose the pg_proc.protransform column to represent the
support function. The existing protransform functions already have
nearly the sort of API I'm thinking about, but they only accept
FuncExpr* not any other node type. It'd be easy to change them
though, because there's only about a dozen and they are all in core;
we never invented any way for extensions to access that functionality.
(So actually, the initial API spec here would include three
possibilities, the third one being equivalent to the current
protransform behavior.)

As for attaching support functions to operators, we could
consider widening the pg_operator catalog to add a new column.
But I think it might be a cleaner answer to just say "use the support
function attached to the operator's implementation function,
if there is one". This would require that the support functions
be able to cope with either FuncExpr or OpExpr inputs, but that
does not seem like much of a burden as long as it's part of the
API spec from day one.

Since there isn't any SQL API for attaching support functions,
we'd have to add one, but adding another clause to CREATE FUNCTION
isn't all that hard. (Annoyingly, we haven't created any cheaply
extensible syntax for CREATE FUNCTION, so this'd likely require
adding another keyword. I'm not interested in doing more than
that right now, though.)

I'd be inclined to rename pg_proc.protransform to "prosupport"
to reflect its wider responsibility, and make the new CREATE FUNCTION
clause be "SUPPORT FUNCTION foo" or some such. I'm not wedded
to that terminology, if anyone has a better idea.

One thing that's not entirely clear to me is what permissions would be
required to use that clause. The support functions will have signature
"f(internal) returns internal", so creating them at all will require
superuser privilege, but it seems like we probably also need to restrict
the ability to attach one to a target function --- attaching one to
the wrong function could probably have bad consequences. The easy way
out is to say "you must be superuser"; maybe that's enough for now,
since all the plausible use-cases for this are in extensions containing
C functions anyway. (A support function would have to be coded in C,
although it seems possible that its target function could be something
else.)

Thoughts? If we have agreement on this basic design, making it happen
seems like a pretty straightforward task.

regards, tom lane

PS: there is, however, a stumbling block that I'll address in a separate
message, as it seems independent of this infrastructure.

[1]: /messages/by-id/CACowWR0TXXL0NfPMW2afCKzX++nHHBZLW3-BLusu_B0WjBB1=A@mail.gmail.com

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Tom Lane (#1)

Re: Allowing extensions to supply operator-/function-specific info

I wrote:

What I'm envisioning therefore is that we allow an auxiliary function to
be attached to any operator or function that can provide functionality
like this, and that we set things up so that the set of tasks that
such functions can perform can be extended over time without SQL-level
changes.

Here are some draft patches in pursuit of this goal.

0001 redefines the API for protransform functions, renames that pg_proc
column to prosupport, and likewise renames the existing transform
functions to be xxx_support. There are no actual functionality changes
in this step. I needed to reindent the existing code in the transform
functions, so for ease of review, the -review patch uses "git diff -b"
to suppress most of the reindentation diffs. If you want to actually
apply the patch for testing, use the -apply version.

Possibly worth noting is that I chose to just remove
timestamp_zone_transform and timestamp_izone_transform, rather than
change them from one no-op state to another. We left them in place in
commit c22ecc656 to avoid a catversion bump, but that argument no longer
applies, and there seems little likelihood that we'll need them soon.

0002 adds the ability to attach a support function via CREATE/ALTER
FUNCTION, and adds the necessary pg_dump and ruleutils support for that.
The only thing that's not pretty mechanical about that is that ALTER
FUNCTION needs the ability to replace a dependency on a previous
support function. For that, we should use changeDependencyFor() ...
but there's a problem, which is that that function can't cope with
the case where the existing dependency is on a pinned object.
We'd left that unimplemented, arguing that it wasn't really necessary
for the existing usage of that function to change schema dependencies.
But it seems fairly likely that the case would occur for support
functions, so I went ahead and fixed changeDependencyFor() to handle
it. That leads to a change in the alter_table regression test, which
was pedantically verifying that the limitation existed.

(We could alternatively leave out the ability to set this option in
ALTER FUNCTION, requiring people to use CREATE OR REPLACE FUNCTION
for it. But I'm figuring that extension update scripts will want to
add support functions to existing functions, so it'd be tedious to not
be able to do it with a simple ALTER.)

0003 is where something useful happens. It extends the API to allow
support functions to define the selectivity estimates, cost estimates,
and rowcount estimates (for set-returning functions) of their target
functions. I can't overstate how important this is: it's retiring
technical debt that has been there for decades. As proof of concept,
there is a quick hack in the regression tests that teaches the planner
to make accurate rowcount estimates for generate_series(int, int)
with constant or estimatable arguments.

There's a considerable amount of follow-up work that ought to happen
now to make use of these capabilities for places that have been
pain points in the past, such as generate_series() and unnest().
But I haven't touched that yet.

Still to be done is to provide an API responding to Paul's original
problem, i.e. allowing an extension to generate lossy index clauses
when one of its operators or functions appears in WHERE. That's
going to be more complex than 0003 --- for one thing, I think I'd
like to try to refactor the existing hard-wired cases in indxpath.c
so that they live in datatype-specific support functions instead of
the core index code.

But first, I'd like to push forward with committing what I've got.
I think this is pretty damn compelling already, even if nothing
further got done for v12. Is anybody interested in reviewing?

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Tom Lane (#2)

Re: Allowing extensions to supply operator-/function-specific info

I wrote:

There's a considerable amount of follow-up work that ought to happen
now to make use of these capabilities for places that have been
pain points in the past, such as generate_series() and unnest().
But I haven't touched that yet.

Attached is an 0004 that makes a stab at providing some intelligence
for unnest() and the integer cases of generate_series(). This only
affects one plan choice in the existing regression tests; I tweaked
that test to keep the plan the same. I didn't add new test cases
demonstrating the functionality, since it's a bit hard to show it
directly within the constraints of EXPLAIN (COSTS OFF). We could
do something along the lines of the quick-hack rowcount test in 0003,
perhaps, but that's pretty indirect.

Looking at this, I'm dissatisfied with the amount of new #include's
being dragged into datatype-specific .c files. I don't really want
to end up with most of utils/adt/ having dependencies on planner
data structures, but that's where we would be headed. I can think
of a couple of possibilities:

* Instead of putting support functions beside their target function,
group all the core's support functions into one new .c file. I'm
afraid this would lead to the reverse problem of having to import
lots of datatype-private info into that file.

* Try to refactor the planner's .h files so that there's just one
"external use" header providing stuff like estimate_expression_value,
while keeping PlannerInfo as an opaque struct. Then importing that
into utils/adt/ files would not represent such a big dependency
footprint.

I find the second choice more appealing, though it's getting a bit
far afield from where this started. OTOH, lots of other header
refactoring is going on right now, so why not ...

Thoughts?

regards, tom lane

Simon Riggs

simon@2ndQuadrant.com

over 7 years ago

In reply to: Tom Lane (#1)

Re: Allowing extensions to supply operator-/function-specific info

On Sun, 20 Jan 2019 at 23:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What I'm envisioning therefore is that we allow an auxiliary function to
be attached to any operator or function that can provide functionality
like this, and that we set things up so that the set of tasks that
such functions can perform can be extended over time without SQL-level
changes. For example, we could say that the function takes a single
Node* argument, and that the type of Node tells it what to do, and if it
doesn't recognize the type of Node it should just return NULL indicating
"use default handling". We'd start out with two relevant Node types,
one for the selectivity-estimation case and one for the extract-a-lossy-
index-qual case, and we could add more over time.

Does this help with these cases?

* Allow a set returning function to specify number of output rows, in cases
where that is variable and dependent upon the input params?

* Allow a normal term to match a functional index, e.g. WHERE x =
'abcdefgh' => WHERE substr(x, 1 , 5) = 'abcde' AND x = 'abcdefgh'

* Allow us to realise that ORDER BY f(x) => ORDER BY x so we can use
ordered paths from indexes, or avoid sorts.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Simon Riggs (#4)

Re: Allowing extensions to supply operator-/function-specific info

Simon Riggs <simon@2ndquadrant.com> writes:

On Sun, 20 Jan 2019 at 23:48, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What I'm envisioning therefore is that we allow an auxiliary function ...

Does this help with these cases?

* Allow a set returning function to specify number of output rows, in cases
where that is variable and dependent upon the input params?

Yes, within the usual limits of what the planner can know. The 0004
patch I posted yesterday correctly estimates the number of rows for
constant-arguments cases of generate_series() and unnest(anyarray),
and it also understands unnest(array[x,y,z,...]) even when some of the
array[] elements aren't constants. There's room to add knowledge about
other SRFs, but those are cases I can recall hearing complaints about.

* Allow a normal term to match a functional index, e.g. WHERE x =
'abcdefgh' => WHERE substr(x, 1 , 5) = 'abcde' AND x = 'abcdefgh'

I'm a bit confused about what you think this example means. I do
intend to work on letting extensions define rules for extracting
index clauses from function calls, because that's the requirement
that PostGIS is after in the thread that started this. I don't
know whether that would satisfy your concern, because I'm not clear
on what your concern is.

* Allow us to realise that ORDER BY f(x) => ORDER BY x so we can use
ordered paths from indexes, or avoid sorts.

Hm. That's not part of what I'm hoping to get done for v12, but you
could imagine a future extension to add a support request type that
allows deriving related pathkeys. There would be a lot of work to do
to make that happen, but the aspect of it that requires adding
function-specific knowledge could usefully be packaged as a
support-function request.

regards, tom lane

Robert Haas

robertmhaas@gmail.com

over 7 years ago

In reply to: Tom Lane (#3)

Re: Allowing extensions to supply operator-/function-specific info

On Sat, Jan 26, 2019 at 12:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Attached is an 0004 that makes a stab at providing some intelligence
for unnest() and the integer cases of generate_series().

That looks awesome.

I'm somewhat dubious about whole API. It's basically -- if you have a
problem and a PhD in PostgreSQL-ology, you can write some C code to
fix it. On the other hand, the status quo is that you may as well
just forget about fixing it, which is clearly even worse. And I don't
really know how to do better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Robert Haas (#6)

Re: Allowing extensions to supply operator-/function-specific info

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Jan 26, 2019 at 12:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Attached is an 0004 that makes a stab at providing some intelligence
for unnest() and the integer cases of generate_series().

That looks awesome.

I'm somewhat dubious about whole API. It's basically -- if you have a
problem and a PhD in PostgreSQL-ology, you can write some C code to
fix it. On the other hand, the status quo is that you may as well
just forget about fixing it, which is clearly even worse. And I don't
really know how to do better.

Well, you need to be able to write a C extension :-(. I kinda wish
that were not a requirement, but in practice I think the main audience
is people like PostGIS, who already cleared that bar. I hope that
we'll soon have a bunch of examples, like those in the 0004 patch,
that people can look at to see how to do things in this area. I see
no reason to believe it'll be all that much harder than anything
else extension authors have to do.

regards, tom lane

Simon Riggs

simon@2ndQuadrant.com

over 7 years ago

In reply to: Tom Lane (#5)

Re: Allowing extensions to supply operator-/function-specific info

On Sun, 27 Jan 2019 at 19:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:

* Allow a normal term to match a functional index, e.g. WHERE x =
'abcdefgh' => WHERE substr(x, 1 , 5) = 'abcde' AND x = 'abcdefgh'

I'm a bit confused about what you think this example means. I do
intend to work on letting extensions define rules for extracting
index clauses from function calls, because that's the requirement
that PostGIS is after in the thread that started this. I don't
know whether that would satisfy your concern, because I'm not clear
on what your concern is.

To be able to extract indexable clauses where none existed before.

Hash functions assume that x = N => hash(x) = hash(N) AND x = N
so I want to be able to assume
x = K => f(x) = f(K) AND x = K
for specific f()
to allow indexable operations when we have an index on f(x) only

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Simon Riggs (#8)

Re: Allowing extensions to supply operator-/function-specific info

Simon Riggs <simon@2ndquadrant.com> writes:

On Sun, 27 Jan 2019 at 19:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:

... I don't
know whether that would satisfy your concern, because I'm not clear
on what your concern is.

To be able to extract indexable clauses where none existed before.

That's a pretty vague statement, because it describes what I want
to do perfectly, but this doesn't:

Hash functions assume that x = N => hash(x) = hash(N) AND x = N
so I want to be able to assume
x = K => f(x) = f(K) AND x = K
for specific f()
to allow indexable operations when we have an index on f(x) only

The problem with that is that if the only thing that's in the query is
"x = K" then there is nothing to cue the planner that it'd be worth
expending cycles thinking about f(x). Sure, you could hang a planner
support function on the equals operator that would go off and expend
arbitrary amounts of computation looking for speculative matches ...
but nobody is going to accept that as a patch, because the cost/benefit
ratio is going to be awful for 99% of users.

The mechanism I'm proposing is based on the thought that for
specialized functions (or operators) like PostGIS' ST_Intersects(),
it'll be worth expending extra cycles when one of those shows up
in WHERE. I don't think that scales to plain-vanilla equality though.

Conceivably, you could turn that around and look for support functions
attached to the functions/operators that are in an index expression,
and give them the opportunity to derive lossy indexquals based on
comparing the index expression to query quals. I have no particular
interest in working on that right now, because it doesn't respond to
what I understand PostGIS' need to be, and there are only so many
hours in the day. But maybe it could be made workable in the future.

regards, tom lane

#10

Simon Riggs

simon@2ndQuadrant.com

over 7 years ago

In reply to: Tom Lane (#9)

Re: Allowing extensions to supply operator-/function-specific info

On Tue, 29 Jan 2019 at 09:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Simon Riggs <simon@2ndquadrant.com> writes:

On Sun, 27 Jan 2019 at 19:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:

... I don't
know whether that would satisfy your concern, because I'm not clear
on what your concern is.

To be able to extract indexable clauses where none existed before.

That's a pretty vague statement, because it describes what I want
to do perfectly, but this doesn't:

Hash functions assume that x = N => hash(x) = hash(N) AND x = N
so I want to be able to assume
x = K => f(x) = f(K) AND x = K
for specific f()
to allow indexable operations when we have an index on f(x) only

The problem with that is that if the only thing that's in the query is
"x = K" then there is nothing to cue the planner that it'd be worth
expending cycles thinking about f(x).

I agree. That is the equivalent of a SeqScan; the wrong way to approach it.

Sure, you could hang a planner
support function on the equals operator that would go off and expend
arbitrary amounts of computation looking for speculative matches ...
but nobody is going to accept that as a patch, because the cost/benefit
ratio is going to be awful for 99% of users.

The mechanism I'm proposing is based on the thought that for
specialized functions (or operators) like PostGIS' ST_Intersects(),
it'll be worth expending extra cycles when one of those shows up
in WHERE. I don't think that scales to plain-vanilla equality though.

Conceivably, you could turn that around and look for support functions
attached to the functions/operators that are in an index expression,
and give them the opportunity to derive lossy indexquals based on
comparing the index expression to query quals.

That way around is the right way. If an index exists, explore whether it
can be used or not. If there are no indexes with appropriate support
functions, it will cost almost nothing to normal queries.

The problem of deriving potentially useful indexes is more expensive, I
understand.

I have no particular
interest in working on that right now, because it doesn't respond to
what I understand PostGIS' need to be, and there are only so many
hours in the day. But maybe it could be made workable in the future.

I thought the whole exercise was about adding generic tools for everybody
to use. The Tom I've worked with for more than a few years would not have
said that; that is my normal line! You said PostGIS was looking to
"automatically convert WHERE clauses into lossy index quals." which sounds
very similar to what I outlined.

Either way, thanks.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Simon Riggs (#10)

Re: Allowing extensions to supply operator-/function-specific info

Simon Riggs <simon@2ndquadrant.com> writes:

On Tue, 29 Jan 2019 at 09:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I have no particular
interest in working on that right now, because it doesn't respond to
what I understand PostGIS' need to be, and there are only so many
hours in the day. But maybe it could be made workable in the future.

I thought the whole exercise was about adding generic tools for everybody
to use.

Well, I'm building infrastructure plus a small subset of what might
someday sit atop that infrastructure. I'm not prepared to commit
right now to building stuff I can't finish for v12.

You said PostGIS was looking to
"automatically convert WHERE clauses into lossy index quals." which sounds
very similar to what I outlined.

As I understand it, what they have is complex WHERE clauses from which
they want to extract clauses usable with simple (non-expression) indexes.
The case you seem to be worried about is the reverse: complicated index
definition and simple WHERE clause. I think we're agreed that these two
cases can't be solved with the very same facility. The support-function
mechanism probably can be used to provide extensibility for logic that
tries to attack the complicated-index case, but its mere existence won't
cause that logic to spring into being.

regards, tom lane

#12

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Tom Lane (#11)

Re: Allowing extensions to supply operator-/function-specific info

Just to show I'm not completely crazy, here's a more or less
feature-complete patch set for doing $SUBJECT.

Patches 0001-0005 are the same as previously posted, either in
this thread or <22182.1549124950@sss.pgh.pa.us>, but rebased
over the planner header refactoring I committed recently.

Patch 0006 is the new work: it removes all the "special index
operator" cruft from indxpath.c and puts it into planner support
functions. I need to write (a lot) more about the API specification
for this support request type, but I think the code is pretty much OK.

I'm still dithering about where to put these planner support functions.
0006 drops them into a new file "utils/adt/likesupport.c", but I'm
not sold on that as a final answer. The LIKE and regex support
functions should share code, but the execution functions for those
are in different files (like.c and regexp.c), so the "put it beside the
execution function" heuristic isn't much help. Also, those functions
rely on the pattern_fixed_prefix() functionality that's currently in
selfuncs.c. I'd kind of like to end up with that in the same file
as its callers. In any case, the network-subset support code need
not stay beside the LIKE/regex functions, but I didn't bother to
find a new home for it yet.

Another thing worth commenting about is that I'd intended to have
all the LIKE/regex functions share one support function, using a
switch on function OID to determine what to do exactly, much as the
existing code used a switch on operator OID. That crashed and
burned though, because some of those functions have multiple aliases
in pg_proc, but fmgroids.h has a macro for only one of the aliases.
Maybe it's time to do something about that? The factorization I used
instead, with a separate support function for each pattern-matching
rule, isn't awful; but I can foresee that this won't be a great answer
for all cases.

Barring objections, I hope to push forward and commit this soon.

regards, tom lane

#13

Paul Ramsey

pramsey@cleverelephant.ca

over 7 years ago

In reply to: Tom Lane (#7)

Re: Allowing extensions to supply operator-/function-specific info

On Mon, Jan 28, 2019 at 9:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

is people like PostGIS, who already cleared that bar. I hope that
we'll soon have a bunch of examples, like those in the 0004 patch,
that people can look at to see how to do things in this area. I see
no reason to believe it'll be all that much harder than anything
else extension authors have to do.

It's a little harder :)

So... trying to figure out how to use SupportRequestIndexCondition to
convert a call to Intersects() in to a call that also has the operator
&& as well.

Looking at the examples, they are making use of the opfamily that
comes in SupportRequestIndexCondition.opfamily.

That opfamily Oid is the first one in the IndexOptInfo.opfamily array.
Here's where my thread of understanding fails to follow. I have, in
PostGIS, actually no operator families defined (CREATE OPERATOR
FAMILY). I do, however, have quite a few operator classes defined for
geometry: 10, actually!

btree_geometry_ops
hash_geometry_ops
gist_geometry_ops_2d
gist_geometry_ops_nd
brin_geometry_inclusion_ops_2d
brin_geometry_inclusion_ops_3d
brin_geometry_inclusion_ops_4d
spgist_geometry_ops_2d
spgist_geometry_ops_nd
spgist_geometry_ops_nd

Some of those are not useful to me (btree, hash) for sure.
Some of them (gist_geometry_ops_2d, spgist_geometry_ops_2d ) use the
&& operator to indicate the lossy operation I would like to combine
with ST_Intersects.
Some of them (gist_geometry_ops_nd, spgist_geometry_ops_nd) use the
&&& operator to indicate the lossy operation I would like to combine
with ST_Intersects.

A given call to ST_Intersects(tbl1.geom, tbl2.geom) could have two
indexes to apply the problem, but
SupportRequestIndexCondition.opfamily will, I assume, only be exposing
one to me: which one?

Anyways, to true up how hard this is, I've been carefully reading the
implementations for network address types and LIKE, and I'm still
barely at the WTF stage. The selectivity and the number of rows
support modes I could do. The SupportRequestIndexCondition is based on
a detailed knowledge of what an operator family is, an operator class
is, how those relate to types... I think I can get there, but it's
going to be far from easy (for me). And it'll put a pretty high bar in
front of anyone who previously just whacked an inline SQL function in
place to get an index assisted function up and running.

#14

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Paul Ramsey (#13)

Re: Allowing extensions to supply operator-/function-specific info

Paul Ramsey <pramsey@cleverelephant.ca> writes:

So... trying to figure out how to use SupportRequestIndexCondition to
convert a call to Intersects() in to a call that also has the operator
&& as well.

OK.

Looking at the examples, they are making use of the opfamily that
comes in SupportRequestIndexCondition.opfamily.
That opfamily Oid is the first one in the IndexOptInfo.opfamily array.
Here's where my thread of understanding fails to follow. I have, in
PostGIS, actually no operator families defined (CREATE OPERATOR
FAMILY). I do, however, have quite a few operator classes defined for
geometry: 10, actually!

Yes, you do have operator families: there's no such thing as an operator
class without a containing operator family, and hasn't been for quite
a long time. If you write CREATE OPERATOR CLASS without a FAMILY
clause, the command silently creates an opfamily with the same name you
specified for the opclass, and links the two together.

Some of them (gist_geometry_ops_2d, spgist_geometry_ops_2d ) use the
&& operator to indicate the lossy operation I would like to combine
with ST_Intersects.
Some of them (gist_geometry_ops_nd, spgist_geometry_ops_nd) use the
&&& operator to indicate the lossy operation I would like to combine
with ST_Intersects.

Right. So the hard part here is to figure out whether the OID you're
handed matches one of these operator families. As I mentioned (in
the other thread [1]/messages/by-id/22876.1550591107@sss.pgh.pa.us, maybe you didn't see it?) the best short-term
idea I've got for that is to look up the opfamily by OID (see the
OPFAMILYOID syscache) and check to see if its name matches one of
the above. You might want to verify that the index AM's OID is what
you expect, too, just for a little extra safety.

A given call to ST_Intersects(tbl1.geom, tbl2.geom) could have two
indexes to apply the problem, but
SupportRequestIndexCondition.opfamily will, I assume, only be exposing
one to me: which one?

It's whichever one the index column's opclass belongs to. Basically what
you're trying to do here is verify whether the index will support the
optimization you want to perform.

Anyways, to true up how hard this is, I've been carefully reading the
implementations for network address types and LIKE, and I'm still
barely at the WTF stage. The selectivity and the number of rows
support modes I could do. The SupportRequestIndexCondition is based on
a detailed knowledge of what an operator family is, an operator class
is, how those relate to types... I think I can get there, but it's
going to be far from easy (for me).

You definitely want to read this:

https://www.postgresql.org/docs/devel/xindex.html#XINDEX-OPFAMILY

and maybe some of the surrounding sections.

And it'll put a pretty high bar in
front of anyone who previously just whacked an inline SQL function in
place to get an index assisted function up and running.

Sure, but that was a pretty lame way of getting the optimization,
as you well know because you've been fighting its deficiencies for
so long.

Perhaps at some point we'll have some infrastructure that makes this
less painful, but it's not happening for v12.

regards, tom lane

[1]: /messages/by-id/22876.1550591107@sss.pgh.pa.us

#15

Paul Ramsey

pramsey@cleverelephant.ca

over 7 years ago

In reply to: Tom Lane (#14)

Re: Allowing extensions to supply operator-/function-specific info

On Mon, Feb 25, 2019 at 3:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Looking at the examples, they are making use of the opfamily that
comes in SupportRequestIndexCondition.opfamily.
That opfamily Oid is the first one in the IndexOptInfo.opfamily array.
Here's where my thread of understanding fails to follow. I have, in
PostGIS, actually no operator families defined (CREATE OPERATOR
FAMILY). I do, however, have quite a few operator classes defined for
geometry: 10, actually!

Yes, you do have operator families: there's no such thing as an operator
class without a containing operator family, and hasn't been for quite
a long time. If you write CREATE OPERATOR CLASS without a FAMILY
clause, the command silently creates an opfamily with the same name you
specified for the opclass, and links the two together.

OK, starting to understand...

Some of them (gist_geometry_ops_2d, spgist_geometry_ops_2d ) use the
&& operator to indicate the lossy operation I would like to combine
with ST_Intersects.
Some of them (gist_geometry_ops_nd, spgist_geometry_ops_nd) use the
&&& operator to indicate the lossy operation I would like to combine
with ST_Intersects.

Right. So the hard part here is to figure out whether the OID you're
handed matches one of these operator families. As I mentioned (in
the other thread [1], maybe you didn't see it?) the best short-term
idea I've got for that is to look up the opfamily by OID (see the
OPFAMILYOID syscache) and check to see if its name matches one of
the above. You might want to verify that the index AM's OID is what
you expect, too, just for a little extra safety.

I read it, I just didn't entirely understand it. I think maybe I do
know? I'm reading and re-reading everything and trying to build a
mental model that makes sense :)

Back to SupportRequestIndexCondition.opfamily though:

It's whichever one the index column's opclass belongs to. Basically what
you're trying to do here is verify whether the index will support the
optimization you want to perform.

* If I have tbl1.geom
* and I have built two indexes on it, a btree_geometry_ops and a
gist_geometry_ops_2d, and
* and SupportRequestIndexCondition.opfamily returns me the btree family
* and I look and see, "damn there is no && operator in there"
* am I SOL, even though an appropriate index does exist?

Sure, but that was a pretty lame way of getting the optimization,
as you well know because you've been fighting its deficiencies for
so long.

Hrm. :) I will agree to disagree. This is an intellectually
interesting journey, but most of its length is quite far removed from
our proximate goal of adding realistic costs to our functions, and the
code added will be quite a bit harder for folks to follow than what it
replaces.

Reading your code is a pleasure and the comments are great, it's just
a hard slog up for someone who is still going "Node*, hm, how does
that work..."

ATB,

Show quoted text

Perhaps at some point we'll have some infrastructure that makes this
less painful, but it's not happening for v12.

regards, tom lane

[1] /messages/by-id/22876.1550591107@sss.pgh.pa.us

#16

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Paul Ramsey (#15)

Re: Allowing extensions to supply operator-/function-specific info

Paul Ramsey <pramsey@cleverelephant.ca> writes:

On Mon, Feb 25, 2019 at 3:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's whichever one the index column's opclass belongs to. Basically what
you're trying to do here is verify whether the index will support the
optimization you want to perform.

* If I have tbl1.geom
* and I have built two indexes on it, a btree_geometry_ops and a
gist_geometry_ops_2d, and
* and SupportRequestIndexCondition.opfamily returns me the btree family
* and I look and see, "damn there is no && operator in there"
* am I SOL, even though an appropriate index does exist?

No. If there are two indexes matching your function's argument, you'll
get a separate call for each index. The support function is only
responsible for thinking about one index at a time and seeing if it
can be used. If more than one can be used, figuring out which
one is better is done by later cost comparisons.

regards, tom lane

#17

Paul Ramsey

pramsey@cleverelephant.ca

over 7 years ago

In reply to: Tom Lane (#16)

Re: Allowing extensions to supply operator-/function-specific info

On Mon, Feb 25, 2019 at 4:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Paul Ramsey <pramsey@cleverelephant.ca> writes:

On Mon, Feb 25, 2019 at 3:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's whichever one the index column's opclass belongs to. Basically what
you're trying to do here is verify whether the index will support the
optimization you want to perform.

* If I have tbl1.geom
* and I have built two indexes on it, a btree_geometry_ops and a
gist_geometry_ops_2d, and
* and SupportRequestIndexCondition.opfamily returns me the btree family
* and I look and see, "damn there is no && operator in there"
* am I SOL, even though an appropriate index does exist?

No. If there are two indexes matching your function's argument, you'll
get a separate call for each index. The support function is only
responsible for thinking about one index at a time and seeing if it
can be used. If more than one can be used, figuring out which
one is better is done by later cost comparisons.

Ah, wonderful!

New line of questioning: under what conditions will the support
function be called in a T_SupportRequestIndexCondition mode? I have
created a table (foo) a geometry column (g) and an index (GIST on
foo(g)) and am running a query against foo using a noop function with
a support function bound to it.

The support function is called, twice, once in
T_SupportRequestSimplify mode and once in T_SupportRequestCost mode.

What triggers T_SupportRequestIndexCondition mode?

Thanks!

#18

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Paul Ramsey (#17)

Re: Allowing extensions to supply operator-/function-specific info

Paul Ramsey <pramsey@cleverelephant.ca> writes:

New line of questioning: under what conditions will the support
function be called in a T_SupportRequestIndexCondition mode?

It'll be called if the target function appears at top level of a
WHERE or JOIN condition and any one of the function's arguments
syntactically matches some column of an index.

If there's multiple arguments matching the same index column, say
index on "x" and we have "f(z, x, x)", you'll get one call and
it will tell you about the first match (req->indexarg == 1 in
this example). Sorting out what to do in such a case is your
responsibility.

If there's arguments matching more than one index column, say
index declared on (x, y) and we have "f(x, y)", you'll get a
separate call for each index column. Again, sorting out what
to do for each one is your responsibility.

In most cases, multiple matching arguments are going to lead to
failure to construct any useful index condition, because your
comparison value has to be a pseudoconstant (ie, not a variable
from the same table, so in both of the above examples there's
no function argument you could compare to). But we don't prejudge
that, because it's possible that a function with 3 or more arguments
could produce something useful anyway. For instance, if what we've
got is "f(x, y, constant)" then it's possible that the semantics of
the function are such that y can be ignored and we can make something
indexable like "x && constant". All this is the support function's
job to know.

I have
created a table (foo) a geometry column (g) and an index (GIST on
foo(g)) and am running a query against foo using a noop function with
a support function bound to it.

The support function is called, twice, once in
T_SupportRequestSimplify mode and once in T_SupportRequestCost mode.

What's the query look like exactly? The other two calls will occur
anyway, but SupportRequestIndexCondition depends on the function
call's placement.

regards, tom lane

#19

Paul Ramsey

pramsey@cleverelephant.ca

over 7 years ago

In reply to: Tom Lane (#18)

Re: Allowing extensions to supply operator-/function-specific info

On Feb 26, 2019, at 2:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I have
created a table (foo) a geometry column (g) and an index (GIST on
foo(g)) and am running a query against foo using a noop function with
a support function bound to it.

The support function is called, twice, once in
T_SupportRequestSimplify mode and once in T_SupportRequestCost mode.

What's the query look like exactly? The other two calls will occur
anyway, but SupportRequestIndexCondition depends on the function
call's placement.

select geos_intersects_new(g, 'POINT(0 0)') from foo;

Show quoted text

regards, tom lane

#20

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Paul Ramsey (#19)

Re: Allowing extensions to supply operator-/function-specific info

Paul Ramsey <pramsey@cleverelephant.ca> writes:

On Feb 26, 2019, at 2:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What's the query look like exactly? The other two calls will occur
anyway, but SupportRequestIndexCondition depends on the function
call's placement.