pgsql: Generalize hash and ordering support in amapi

Started by Peter Eisentrautover 1 year ago10 messagescomitters
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

Generalize hash and ordering support in amapi

Stop comparing access method OID values against HASH_AM_OID and
BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see
if it advertises its ability to perform the necessary ordering,
hashing, or cross-type comparing functionality. A field amcanorder
already existed, this uses it more widely. Fields amcanhash and
amcancrosscompare are added for the other purposes.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: /messages/by-id/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/ce62f2f2a0a48d021f250ba84dfcab5d45ddc914

Modified Files
--------------
contrib/bloom/blutils.c | 2 ++
doc/src/sgml/indexam.sgml | 4 +++
src/backend/access/brin/brin.c | 2 ++
src/backend/access/gin/ginutil.c | 2 ++
src/backend/access/gist/gist.c | 2 ++
src/backend/access/hash/hash.c | 2 ++
src/backend/access/nbtree/nbtree.c | 2 ++
src/backend/access/spgist/spgutils.c | 2 ++
src/backend/commands/opclasscmds.c | 34 ++++++++++++------------
src/backend/executor/nodeIndexscan.c | 4 +--
src/backend/utils/cache/lsyscache.c | 8 +++---
src/include/access/amapi.h | 4 +++
src/test/modules/dummy_index_am/dummy_index_am.c | 2 ++
src/test/regress/expected/alter_generic.out | 6 ++---
14 files changed, 50 insertions(+), 26 deletions(-)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#1)
Re: pgsql: Generalize hash and ordering support in amapi

Peter Eisentraut <peter@eisentraut.org> writes:

Generalize hash and ordering support in amapi
Stop comparing access method OID values against HASH_AM_OID and
BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see
if it advertises its ability to perform the necessary ordering,
hashing, or cross-type comparing functionality. A field amcanorder
already existed, this uses it more widely. Fields amcanhash and
amcancrosscompare are added for the other purposes.

AFAICS, this patch sets amcancrosscompare true only for btree,
which means this change to equality_ops_are_compatible is surely wrong:

-       /* must be btree or hash */
-       if (op_form->amopmethod == BTREE_AM_OID ||
-           op_form->amopmethod == HASH_AM_OID)
+       if (amroutine->amcancrosscompare)

More generally, I think that "amcancrosscompare" is a horribly
underspecified and misleadingly-named concept. Most of our
AMs are capable of supporting cross-type operators, though for
some it's more about what the per-opclass infrastructure can do
than what the AM does. So what does that flag *really* mean?

I also object strongly to the fact that the comments for
equality_ops_are_compatible and comparison_ops_are_compatible
were not modified:

* This is trivially true if they are the same operator. Otherwise,
* we look to see if they can be found in the same btree or hash opfamily.

* This is trivially true if they are the same operator. Otherwise,
* we look to see if they can be found in the same btree opfamily.

I do not think this was in shape to be committed. For stuff
like this, clarity of thought and precision of specification
are critical, and this patch has neither.

regards, tom lane

#3Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tom Lane (#2)
Re: pgsql: Generalize hash and ordering support in amapi

On Thu, Feb 27, 2025 at 8:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

Generalize hash and ordering support in amapi
Stop comparing access method OID values against HASH_AM_OID and
BTREE_AM_OID, and instead check the IndexAmRoutine for an index to see
if it advertises its ability to perform the necessary ordering,
hashing, or cross-type comparing functionality. A field amcanorder
already existed, this uses it more widely. Fields amcanhash and
amcancrosscompare are added for the other purposes.

AFAICS, this patch sets amcancrosscompare true only for btree,
which means this change to equality_ops_are_compatible is surely wrong:

-       /* must be btree or hash */
-       if (op_form->amopmethod == BTREE_AM_OID ||
-           op_form->amopmethod == HASH_AM_OID)
+       if (amroutine->amcancrosscompare)

It seems you are right. hashhandler()'s amroutine should have this true,
also.

More generally, I think that "amcancrosscompare" is a horribly
underspecified and misleadingly-named concept. Most of our
AMs are capable of supporting cross-type operators, though for
some it's more about what the per-opclass infrastructure can do
than what the AM does. So what does that flag *really* mean?

The comments in amapi.h seem to have gotten shortened since v19 of the
patch which had them as:

+   /*
+    * Does AM support hashing the indexed column's value, including
providing
+    * all hash semantics functions for HASHSTANDARD_PROC and
HASHEXTENDED_PROC
+    * conforming to the same calling conventions as those of the hash AM?
+    */
+   bool        amcanhash;
+   /*
+    * Does AM have equality semantics that are compatible across all
equality
+    * operators within an operator family?
+    */
+   bool        amcancrosscompare;

The logic in equality_ops_are_compatible() was trusting that equality
operators found in an opfamily for btree or hash were ok, but not trusting
operators found in opfamilies of other AMs. Now, after the patch, other
AMs can be marked as suitable. That's really the core of what the flag
means: "Can the system trust that equality operators found in opfamilies
of the AM are well-behaved", or something like that. I agree that this
should really be more a feature of an opfamily than an AM, but the system
already does it on AM-level granularity, and I don't think it's the
responsibility of this patch to rearchitect all that.

I also object strongly to the fact that the comments for
equality_ops_are_compatible and comparison_ops_are_compatible
were not modified:

* This is trivially true if they are the same operator. Otherwise,
* we look to see if they can be found in the same btree or hash opfamily.

* This is trivially true if they are the same operator. Otherwise,
* we look to see if they can be found in the same btree opfamily.

I agree these comments need updating.

Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#4Peter Eisentraut
peter_e@gmx.net
In reply to: Mark Dilger (#3)
Re: pgsql: Generalize hash and ordering support in amapi

On 27.02.25 23:17, Mark Dilger wrote:

On Thu, Feb 27, 2025 at 8:27 AM Tom Lane <tgl@sss.pgh.pa.us
<mailto:tgl@sss.pgh.pa.us>> wrote:

Peter Eisentraut <peter@eisentraut.org
<mailto:peter@eisentraut.org>> writes:

Generalize hash and ordering support in amapi
Stop comparing access method OID values against HASH_AM_OID and
BTREE_AM_OID, and instead check the IndexAmRoutine for an index

to see

if it advertises its ability to perform the necessary ordering,
hashing, or cross-type comparing functionality.  A field amcanorder
already existed, this uses it more widely.  Fields amcanhash and
amcancrosscompare are added for the other purposes.

AFAICS, this patch sets amcancrosscompare true only for btree,
which means this change to equality_ops_are_compatible is surely wrong:

-       /* must be btree or hash */
-       if (op_form->amopmethod == BTREE_AM_OID ||
-           op_form->amopmethod == HASH_AM_OID)
+       if (amroutine->amcancrosscompare)

It seems you are right.  hashhandler()'s amroutine should have this
true, also.

I have fixed that. I will come back to the rest of the discussion in a bit.

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Mark Dilger (#3)
Re: pgsql: Generalize hash and ordering support in amapi

On 27.02.25 23:17, Mark Dilger wrote:

The logic in equality_ops_are_compatible() was trusting that equality
operators found in an opfamily for btree or hash were ok, but not
trusting operators found in opfamilies of other AMs.  Now, after the
patch, other AMs can be marked as suitable.  That's really the core of
what the flag means:  "Can the system trust that equality operators
found in opfamilies of the AM are well-behaved", or something like
that.

Yeah, what might be a good English identifier for that?

I also object strongly to the fact that the comments for
equality_ops_are_compatible and comparison_ops_are_compatible
were not modified:

 * This is trivially true if they are the same operator.  Otherwise,
 * we look to see if they can be found in the same btree or hash
opfamily.

 * This is trivially true if they are the same operator.  Otherwise,
 * we look to see if they can be found in the same btree opfamily.

I agree these comments need updating.

Mark, can you suggest updated wording for those?

#6Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Peter Eisentraut (#5)
Re: pgsql: Generalize hash and ordering support in amapi

On Tue, Mar 4, 2025 at 8:46 AM Peter Eisentraut <peter@eisentraut.org>
wrote:

On 27.02.25 23:17, Mark Dilger wrote:

The logic in equality_ops_are_compatible() was trusting that equality
operators found in an opfamily for btree or hash were ok, but not
trusting operators found in opfamilies of other AMs. Now, after the
patch, other AMs can be marked as suitable. That's really the core of
what the flag means: "Can the system trust that equality operators
found in opfamilies of the AM are well-behaved", or something like
that.

Yeah, what might be a good English identifier for that?

I also object strongly to the fact that the comments for
equality_ops_are_compatible and comparison_ops_are_compatible
were not modified:

* This is trivially true if they are the same operator. Otherwise,
* we look to see if they can be found in the same btree or hash
opfamily.

* This is trivially true if they are the same operator. Otherwise,
* we look to see if they can be found in the same btree opfamily.

I agree these comments need updating.

Mark, can you suggest updated wording for those?

Yes, happily, though I think I already did, in the v21 patch set. Here is
the meat of that patch:

- * This is trivially true if they are the same operator.  Otherwise,
- * we look to see if they can be found in the same btree or hash opfamily.
- * Either finding allows us to assume that they have compatible notions
- * of equality.  (The reason we need to do these pushups is that one might
- * be a cross-type operator; for instance int24eq vs int4eq.)
+ * This is trivially true if they are the same operator.  Otherwise, we
look to
+ * see if they can be found in the same opfamily of an index AM which
+ * advertises amcancrosscompare.  Either finding allows us to assume that
they
+ * have compatible notions of equality.  (The reason we need to do these
+ * pushups is that one might be a cross-type operator; for instance
int24eq vs
+ * int4eq.)

and

- * This is trivially true if they are the same operator.  Otherwise,
- * we look to see if they can be found in the same btree opfamily.
- * For example, '<' and '>=' ops match if they belong to the same family.
+ * This is trivially true if they are the same operator.  Otherwise, we
look to
+ * see if they can be found in the same opfamily of an index AM that
advertises
+ * both amcancrosscompare and amcanorder.  For example, '<' and '>=' ops
match
+ * if they belong to the same family.
  *
- * (This is identical to equality_ops_are_compatible(), except that we
- * don't bother to examine hash opclasses.)
+ * (This is identical to equality_ops_are_compatible(), except that we
don't
+ * bother to examine opclasses for index AMs which cannot do ordering,
such as
+ * the hash index AM.)

See v21-0003-Update-syscache-code-comments.patch for the whole thing,
including a commit message about why this is needed.

--

Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#5)
Re: pgsql: Generalize hash and ordering support in amapi

Peter Eisentraut <peter@eisentraut.org> writes:

On 27.02.25 23:17, Mark Dilger wrote:

The logic in equality_ops_are_compatible() was trusting that equality
operators found in an opfamily for btree or hash were ok, but not
trusting operators found in opfamilies of other AMs. Now, after the
patch, other AMs can be marked as suitable. That's really the core of
what the flag means: "Can the system trust that equality operators
found in opfamilies of the AM are well-behaved", or something like
that.

Yeah, what might be a good English identifier for that?

Looking at the call sites, the callers of equality_ops_are_compatible
basically are interested in whether the two operators agree on which
values are distinct. That can be rephrased as "equality satisfies
the transitive law": if A = B according to one operator, while B = C
according to the other operator, then A = C according to some relevant
member of the opfamily (which might be yet a third operator).

The single caller of comparison_ops_are_compatible is
ineq_histogram_selectivity, which knows it is dealing with
inequality operators (<, >, <=, or >=), and what it wants to know
is whether the two operators use the same sort ordering.

So really the properties we want the AM to promise are "all operators
within one of my opfamilies have the same notion of equality" or "all
operators within one of my opfamilies use the same sort ordering".
You could reduce this to one property that means slightly different
things depending on whether amcanorder, perhaps, since "same sort
ordering" implies "same notion of equality". Maybe call it
"amconsistentsemantics"? OTOH, requiring amcanorder might be
unpleasantly much, since an AM might have btree-like operator
semantics and yet not support ordered retrieval. So perhaps two
separate flags "amconsistentequality" and "amconsistentordering"
would be better.

In any case, my gripe was less about the name of the flag and more
about the lack of a clear specification of what it means. "does AM
support cross-type comparisons" doesn't get the job done. Maybe
we can fit

/* do operators within an opfamily have consistent equality semantics? */
bool amconsistentequality;
/* do operators within an opfamily have consistent ordering semantics? */
bool amconsistentordering;

* This is trivially true if they are the same operator. Otherwise,
* we look to see if they can be found in the same btree or hash
opfamily.

* This is trivially true if they are the same operator. Otherwise,
* we look to see if they can be found in the same btree opfamily.

I agree these comments need updating.

Mark, can you suggest updated wording for those?

Maybe "Otherwise, we look to see if they both belong to an opfamily
that guarantees compatible semantics for equality" (or "for ordering"
in the second case).

BTW, the header comment for query_is_distinct_for also needs updated:

* would use itself. We use equality_ops_are_compatible() to check
* compatibility. That looks at btree or hash opfamily membership, and so
* should give trustworthy answers for all operators that we might need
* to deal with here.)

Also, I'm thinking that equality_ops_are_compatible and
comparison_ops_are_compatible now have a bit of a performance problem.
The original coding was intended to provide a cheap check before
expending the cycles to test op_in_opfamily. This patch has
completely blown that up, since GetIndexAmRoutineByAmId is *more*
expensive than op_in_opfamily. I suggest reversing things so that we
test op_in_opfamily first and only bother to look up the AM details
when we've verified that both operators belong to the same family.

regards, tom lane

#8Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#7)
Re: pgsql: Generalize hash and ordering support in amapi

On 04.03.25 18:35, Tom Lane wrote:

In any case, my gripe was less about the name of the flag and more
about the lack of a clear specification of what it means. "does AM
support cross-type comparisons" doesn't get the job done. Maybe
we can fit

/* do operators within an opfamily have consistent equality semantics? */
bool amconsistentequality;
/* do operators within an opfamily have consistent ordering semantics? */
bool amconsistentordering;

Also, I'm thinking that equality_ops_are_compatible and
comparison_ops_are_compatible now have a bit of a performance problem.
The original coding was intended to provide a cheap check before
expending the cycles to test op_in_opfamily. This patch has
completely blown that up, since GetIndexAmRoutineByAmId is *more*
expensive than op_in_opfamily. I suggest reversing things so that we
test op_in_opfamily first and only bother to look up the AM details
when we've verified that both operators belong to the same family.

I have committed fixes for these issues along the lines you suggested.

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#8)
Re: pgsql: Generalize hash and ordering support in amapi

Peter Eisentraut <peter@eisentraut.org> writes:

I have committed fixes for these issues along the lines you suggested.

Thanks. There is a typo in hashhandler:

-   amroutine->amcancrosscompare = true;
+   amroutine->amconsistentequality = true;
+   amroutine->amconsistentequality = false;

The second line should be setting amconsistentordering = false.

Also, may I suggest one more thing? I think the test in
comparison_ops_are_compatible should be just

-           if (amroutine->amcanorder && amroutine->amconsistentordering)
+           if (amroutine->amconsistentordering)

(and the comment for it needs adjustment too). To my mind,
amconsistentordering is a static declaration that operators
within one of the AM's opfamilies are expected to have this
property. That could be true whether or not the AM is capable
of returning tuples in order. So although these flags might
commonly be set together, I think they are independent
properties.

regards, tom lane

#10Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#9)
Re: pgsql: Generalize hash and ordering support in amapi

On 07.03.25 19:50, Tom Lane wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

I have committed fixes for these issues along the lines you suggested.

Thanks. There is a typo in hashhandler:

-   amroutine->amcancrosscompare = true;
+   amroutine->amconsistentequality = true;
+   amroutine->amconsistentequality = false;

The second line should be setting amconsistentordering = false.

Also, may I suggest one more thing? I think the test in
comparison_ops_are_compatible should be just

-           if (amroutine->amcanorder && amroutine->amconsistentordering)
+           if (amroutine->amconsistentordering)

(and the comment for it needs adjustment too). To my mind,
amconsistentordering is a static declaration that operators
within one of the AM's opfamilies are expected to have this
property. That could be true whether or not the AM is capable
of returning tuples in order. So although these flags might
commonly be set together, I think they are independent
properties.

Agreed, done.