Index AM API cleanup
Hackers,
The index access method API mostly encapsulates the functionality of in-core index types, with some lingering oversights and layering violations. There has been an ongoing discussion for several release cycles concerning how the API might be improved to allow interesting additional functionality. That discussion has frequently included patch proposals to support peculiar needs of esoteric bespoke access methods, which have little interest for the rest of the community.
For your consideration, here is a patch series that takes a different approach. It addresses many of the limitations and layering violations, along with introducing test infrastructure to validate the changes. Nothing in this series is intended to introduce new functionality to the API. Any such, "wouldn't it be great if..." type suggestions for the API are out of scope for this work. On the other hand, this patch set does not purport to fix all such problems; it merely moves the project in that direction.
For validation purposes, the first patch creates shallow copies of hash and btree named "xash" and "xtree" and introduces some infrastructure to run the src/test/regress and src/test/isolation tests against them without needing to duplicate those tests. Numerous failures like "unexpected non-btree AM" can be observed in the test results.
Also for validation purposes, the second patch creates a deep copy of btree named "treeb" which uses modified copies of the btree implementation rather than using the btree implementation by reference. This allows for more experimentation, but might be more code than the community wants. Since this is broken into its own patch, it can be excluded from what eventually gets committed. Even if we knew a priori that this "treeb" test would surely never be committed, it still serves to help anybody reviewing the patch series to experiment with those other changes without having to construct such a test index AM individually.
The next twenty patches are a mix of fixes of various layering violations, such as not allowing non-core index AMs from use in replica identity full, or for speculative insertion, or for foreign key constraints, or as part of merge join; with updates to the "treeb" code as needed. The changes to "treeb" are broken out so that they can also easily be excluded from whatever gets committed.
The final commit changes the ordering of the strategy numbers in treeb. The name "treeb" is a rotation of "btree", and indeed the strategy numbers 1,2,3,4,5 are rotated to 5,1,2,3,4. The fact that treeb indexes work properly after this change is meant to demonstrate that the core changes have been sufficient to address the prior dependency on btree strategy number ordering. Again, this doesn't need to be committed; it might only serve to help reviewers in determining if the functional changes are correct.
Not to harp on this too heavily, but please note that running the core regression and isolation tests against xash, xtree, and treeb are known not to pass. That's the point. But by the end of the patch series, the failures are limited to EXPLAIN output changes; the query results themselves are intended to be consistent with the expected test output. To avoid breaking `make check-world`, these test modules are not added to the test schedule. They are also, at least for now, only useable from make, not from meson.
Internal development versions 1..16 not included. Andrew, Peter, and Alex have all provided reviews internally and are cc'd here. Patch by me. Here is v17 for the community:
Attachments:
v17.tar.bz2application/x-bzip2; name=v17.tar.bz2; x-unix-mode=0644Download+5-7
On Thu, 22 Aug 2024 at 00:25, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
Hackers,
The index access method API mostly encapsulates the functionality of in-core index types, with some lingering oversights and layering violations. There has been an ongoing discussion for several release cycles concerning how the API might be improved to allow interesting additional functionality. That discussion has frequently included patch proposals to support peculiar needs of esoteric bespoke access methods, which have little interest for the rest of the community.
For your consideration, here is a patch series that takes a different approach. It addresses many of the limitations and layering violations, along with introducing test infrastructure to validate the changes. Nothing in this series is intended to introduce new functionality to the API. Any such, "wouldn't it be great if..." type suggestions for the API are out of scope for this work. On the other hand, this patch set does not purport to fix all such problems; it merely moves the project in that direction.
For validation purposes, the first patch creates shallow copies of hash and btree named "xash" and "xtree" and introduces some infrastructure to run the src/test/regress and src/test/isolation tests against them without needing to duplicate those tests. Numerous failures like "unexpected non-btree AM" can be observed in the test results.
Also for validation purposes, the second patch creates a deep copy of btree named "treeb" which uses modified copies of the btree implementation rather than using the btree implementation by reference. This allows for more experimentation, but might be more code than the community wants. Since this is broken into its own patch, it can be excluded from what eventually gets committed. Even if we knew a priori that this "treeb" test would surely never be committed, it still serves to help anybody reviewing the patch series to experiment with those other changes without having to construct such a test index AM individually.
The next twenty patches are a mix of fixes of various layering violations, such as not allowing non-core index AMs from use in replica identity full, or for speculative insertion, or for foreign key constraints, or as part of merge join; with updates to the "treeb" code as needed. The changes to "treeb" are broken out so that they can also easily be excluded from whatever gets committed.
The final commit changes the ordering of the strategy numbers in treeb. The name "treeb" is a rotation of "btree", and indeed the strategy numbers 1,2,3,4,5 are rotated to 5,1,2,3,4. The fact that treeb indexes work properly after this change is meant to demonstrate that the core changes have been sufficient to address the prior dependency on btree strategy number ordering. Again, this doesn't need to be committed; it might only serve to help reviewers in determining if the functional changes are correct.
Not to harp on this too heavily, but please note that running the core regression and isolation tests against xash, xtree, and treeb are known not to pass. That's the point. But by the end of the patch series, the failures are limited to EXPLAIN output changes; the query results themselves are intended to be consistent with the expected test output. To avoid breaking `make check-world`, these test modules are not added to the test schedule. They are also, at least for now, only useable from make, not from meson.
Internal development versions 1..16 not included. Andrew, Peter, and Alex have all provided reviews internally and are cc'd here. Patch by me. Here is v17 for the community:
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi! Why is the patch attached as .tar.bz2? Usually raw patches are sent here...
--
Best regards,
Kirill Reshke
On Aug 21, 2024, at 12:34 PM, Kirill Reshke <reshkekirill@gmail.com> wrote:
Hi! Why is the patch attached as .tar.bz2? Usually raw patches are sent here...
I worried the patch set, being greater than 1 MB, might bounce or be held up in moderation.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2024-08-21 We 4:09 PM, Mark Dilger wrote:
On Aug 21, 2024, at 12:34 PM, Kirill Reshke<reshkekirill@gmail.com> wrote:
Hi! Why is the patch attached as .tar.bz2? Usually raw patches are sent here...
I worried the patch set, being greater than 1 MB, might bounce or be held up in moderation.
Yes, it would have required moderation AIUI. It is not at all
unprecedented to send a compressed tar of patches, and is explicitly
provided for by the cfbot: see
<https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches.3F>
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
Mark Dilger <mark.dilger@enterprisedb.com> writes:
On Aug 21, 2024, at 12:34 PM, Kirill Reshke <reshkekirill@gmail.com> wrote:
Hi! Why is the patch attached as .tar.bz2? Usually raw patches are sent here...
I worried the patch set, being greater than 1 MB, might bounce or be held up in moderation.
I'm +1 for doing it like this with such a large group of patches.
Separate attachments are nice up to say half a dozen attachments,
but beyond that they're kind of a pain to deal with.
regards, tom lane
Hi Mark,
On Wed, Aug 21, 2024 at 2:25 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
For validation purposes, the first patch creates shallow copies of hash and btree named "xash" and "xtree" and introduces some infrastructure to run the src/test/regress and src/test/isolation tests against them without needing to duplicate those tests. Numerous failures like "unexpected non-btree AM" can be observed in the test results.
Also for validation purposes, the second patch creates a deep copy of btree named "treeb" which uses modified copies of the btree implementation rather than using the btree implementation by reference. This allows for more experimentation, but might be more code than the community wants. Since this is broken into its own patch, it can be excluded from what eventually gets committed. Even if we knew a priori that this "treeb" test would surely never be committed, it still serves to help anybody reviewing the patch series to experiment with those other changes without having to construct such a test index AM individually.
Thank you for providing an infrastructure that allows modules to test
by reference and re-use existing regression and isolation tests. I
believe this approach ensures great coverage for the API cleanup.
I was very excited to compare the “make && make check” results in the
test modules - ‘xash,’ ‘xtree,’ and ‘treeb’ - before and after the
series of AM API fixes. Here are my results:
Before the fixes:
- xash: # 20 of 223 tests failed.
- xtree: # 95 of 223 tests failed.
- treeb: # 47 of 223 tests failed.
After the fixes:
- xash: # 21 of 223 tests failed.
- xtree: # 58 of 223 tests failed.
- treeb: # 58 of 223 tests failed.
I expected the series of fixes to eliminate all failed tests, but that
wasn’t the case. It's nice to see the failures for ‘xtree’ have significantly
decreased as the ‘unexpected non-btree AM’ errors have been resolved.
I noticed some of the remaining test failures are due to trivial index
name changes, like hash -> xash and btree -> treeb.
If we keep xtree and xash for testing, is there a way to ensure all
tests pass instead of excluding them from "make check-world"? On that
note, I ran ‘make check-world’ on the final patch, and everything
looks good.
I had to make some changes to the first two patches in order to run
"make check" and compile the treeb code on my machine. I’ve attached
my changes.
"make installcheck" for treeb is causing issues on my end. I can
investigate further if it’s not a problem for others.
Best,
Alex
On Aug 22, 2024, at 1:36 AM, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:
I had to make some changes to the first two patches in order to run
"make check" and compile the treeb code on my machine. I’ve attached
my changes.
Thank you for the review, and the patches!
"make installcheck" for treeb is causing issues on my end. I can
investigate further if it’s not a problem for others.
The test module index AMs are not intended for use in any installed database, so 'make installcheck' is unnecessary. A mere 'make check' should suffice. However, if you want to run it, you can install the modules, edit postgresql.conf to add 'treeb' to shared_preload_libraries, restart the server, and run 'make installcheck'. This is necessary for 'treeb' because it requests shared memory, and that needs to be done at startup.
The v18 patch set includes the changes your patches suggest, though I modified the approach a bit. Specifically, rather than standardizing on '1.0.0' for the module versions, as your patches do, I went with '1.0', as is standard in other modules in neighboring directories. The '1.0.0' numbering was something I had been using in versions 1..16 of this patch, and I only partially converted to '1.0' before posting v17, so sorry about that. The v18 patch also has some whitespace fixes.
To address your comments about the noise in the test failures, v18 modifies the clone_tests.pl script to do a little more work translating the expected output to expect the module's AM name ("xash", "xtree", "treeb", or whatnot) beyond what that script did in v17.
Attachments:
v18.tar.bz2application/x-bzip2; name=v18.tar.bz2; x-unix-mode=0644Download+5-0
On 21.08.24 21:25, Mark Dilger wrote:
The next twenty patches are a mix of fixes of various layering
violations, such as not allowing non-core index AMs from use in replica
identity full, or for speculative insertion, or for foreign key
constraints, or as part of merge join; with updates to the "treeb" code
as needed. The changes to "treeb" are broken out so that they can also
easily be excluded from whatever gets committed.
I made a first pass through this patch set. I think the issues it aims
to address are mostly legitimate. In a few cases, we might need some
more discussion and perhaps will end up slicing the APIs a bit
differently. The various patches that generalize the strategy numbers
appear to overlap with things being discussed at [0]/messages/by-id/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com, so we should see
that the solution covers all the use cases.
[0]: /messages/by-id/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
/messages/by-id/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
To make a dent, I picked out something that should be mostly harmless:
Stop calling directly into _bt_getrootheight() (patch 0004). I think
this patch is ok, but I might call the API function amgettreeheight
instead of amgetrootheight. The former seems more general.
Also, a note for us all in this thread, changes to the index AM API need
updates to the corresponding documentation in doc/src/sgml/indexam.sgml.
I notice that _bt_getrootheight() is called only to fill in the
IndexOptInfo tree_height field, which is only used by btcostestimate(),
so in some sense this is btree-internal data. But making it so that
btcostestimate() calls _bt_getrootheight() directly to avoid all that
intermediate business seems too complicated, and there was probably a
reason that the cost estimation functions don't open the index.
Interestingly, the cost estimation functions for gist and spgist also
look at the tree_height field but nothing ever fills it on. So with
your API restructuring, someone could provide the missing API functions
for those index types. Might be interesting.
That said, there might be value in generalizing this a bit. If you look
at the cost estimation functions in pgvector (hnswcostestimate() and
ivfflatcostestimate()), they both have this pattern that
btcostestimate() tries to avoid: They open the index, look up some
number, close the index, then make a cost estimate computation with the
number looked up. So another idea would be to generalize the
tree_height field to some "index size data" or even "internal data for
cost estimation". This wouldn't need to change the API much, since
these are all just integer values, but we'd label the functions and
fields a bit differently.
On Aug 26, 2024, at 5:21 AM, Peter Eisentraut <peter@eisentraut.org> wrote:
On 21.08.24 21:25, Mark Dilger wrote:
The next twenty patches are a mix of fixes of various layering
violations, such as not allowing non-core index AMs from use in replica
identity full, or for speculative insertion, or for foreign key
constraints, or as part of merge join; with updates to the "treeb" code
as needed. The changes to "treeb" are broken out so that they can also
easily be excluded from whatever gets committed.I made a first pass through this patch set.
Peter, thanks for the review!
I think the issues it aims to address are mostly legitimate. In a few cases, we might need some more discussion and perhaps will end up slicing the APIs a bit differently. The various patches that generalize the strategy numbers appear to overlap with things being discussed at [0], so we should see that the solution covers all the use cases.
[0]: /messages/by-id/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
Paul, it seems what you are doing in v39-0001-Add-stratnum-GiST-support-function.patch is similar to what I am doing in v17-0012-Convert-strategies-to-and-from-row-compare-types.patch. In particular, your function
+
+/*
+ * Returns the btree number for supported operators, otherwise invalid.
+ */
+Datum
+gist_stratnum_btree(PG_FUNCTION_ARGS)
+{
+ StrategyNumber strat = PG_GETARG_UINT16(0);
+
+ switch (strat)
+ {
+ case RTEqualStrategyNumber:
+ PG_RETURN_UINT16(BTEqualStrategyNumber);
+ case RTLessStrategyNumber:
+ PG_RETURN_UINT16(BTLessStrategyNumber);
+ case RTLessEqualStrategyNumber:
+ PG_RETURN_UINT16(BTLessEqualStrategyNumber);
+ case RTGreaterStrategyNumber:
+ PG_RETURN_UINT16(BTGreaterStrategyNumber);
+ case RTGreaterEqualStrategyNumber:
+ PG_RETURN_UINT16(BTGreaterEqualStrategyNumber);
+ default:
+ PG_RETURN_UINT16(InvalidStrategy);
+ }
+}
looks similar to the implementation of an amtranslate_rctype_function. Do you have any interest in taking a look?
To make a dent, I picked out something that should be mostly harmless: Stop calling directly into _bt_getrootheight() (patch 0004). I think this patch is ok, but I might call the API function amgettreeheight instead of amgetrootheight. The former seems more general.
Peter, your proposed rename seems fine for the current implementation, but your suggestion below might indicate a different naming.
I notice that _bt_getrootheight() is called only to fill in the IndexOptInfo tree_height field, which is only used by btcostestimate(), so in some sense this is btree-internal data. But making it so that btcostestimate() calls _bt_getrootheight() directly to avoid all that intermediate business seems too complicated, and there was probably a reason that the cost estimation functions don't open the index.
Interestingly, the cost estimation functions for gist and spgist also look at the tree_height field but nothing ever fills it on. So with your API restructuring, someone could provide the missing API functions for those index types. Might be interesting.
That said, there might be value in generalizing this a bit. If you look at the cost estimation functions in pgvector (hnswcostestimate() and ivfflatcostestimate()), they both have this pattern that btcostestimate() tries to avoid: They open the index, look up some number, close the index, then make a cost estimate computation with the number looked up. So another idea would be to generalize the tree_height field to some "index size data" or even "internal data for cost estimation". This wouldn't need to change the API much, since these are all just integer values, but we'd label the functions and fields a bit differently.
Would they be just integers? They could also be void*, with amgetrootheight_function returning data allocated in the current memory context. For btree, that would just be a four byte integer, but other indexes could return whatever they like. If you like that idea, I can code that up for v18, naming the field something like amgetcostestimateinfo_function.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2024-08-22 Th 2:28 PM, Mark Dilger wrote:
On Aug 22, 2024, at 1:36 AM, Alexandra Wang<alexandra.wang.oss@gmail.com> wrote:
I had to make some changes to the first two patches in order to run
"make check" and compile the treeb code on my machine. I’ve attached
my changes.Thank you for the review, and the patches!
"make installcheck" for treeb is causing issues on my end. I can
investigate further if it’s not a problem for others.The test module index AMs are not intended for use in any installed database, so 'make installcheck' is unnecessary. A mere 'make check' should suffice. However, if you want to run it, you can install the modules, edit postgresql.conf to add 'treeb' to shared_preload_libraries, restart the server, and run 'make installcheck'. This is necessary for 'treeb' because it requests shared memory, and that needs to be done at startup.
The v18 patch set includes the changes your patches suggest, though I modified the approach a bit. Specifically, rather than standardizing on '1.0.0' for the module versions, as your patches do, I went with '1.0', as is standard in other modules in neighboring directories. The '1.0.0' numbering was something I had been using in versions 1..16 of this patch, and I only partially converted to '1.0' before posting v17, so sorry about that. The v18 patch also has some whitespace fixes.
To address your comments about the noise in the test failures, v18 modifies the clone_tests.pl script to do a little more work translating the expected output to expect the module's AM name ("xash", "xtree", "treeb", or whatnot) beyond what that script did in v17.
Small addition to your clone script, taking into account the existence
of alternative result files:
diff --git a/src/tools/clone_tests.pl b/src/tools/clone_tests.pl
index c1c50ad90b..d041f93f87 100755
--- a/src/tools/clone_tests.pl
+++ b/src/tools/clone_tests.pl
@@ -183,6 +183,12 @@ foreach my $regress (@regress)
push (@dst_regress, "$dst_sql/$regress.sql");
push (@src_expected, "$src_expected/$regress.out");
push (@dst_expected, "$dst_expected/$regress.out");
+ foreach my $alt (1,2)
+ {
+ next unless -f "$src_expected/${regress}_$alt.out";
+ push (@src_expected, "$src_expected/${regress}_$alt.out");
+ push (@dst_expected, "$dst_expected/${regress}_$alt.out");
+ }
}
my @isolation = grep /\w+/, split(/\s+/, $isolation);
foreach my $isolation (@isolation)
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
On 26.08.24 17:10, Mark Dilger wrote:
To make a dent, I picked out something that should be mostly harmless: Stop calling directly into _bt_getrootheight() (patch 0004). I think this patch is ok, but I might call the API function amgettreeheight instead of amgetrootheight. The former seems more general.
Peter, your proposed rename seems fine for the current implementation, but your suggestion below might indicate a different naming.
I notice that _bt_getrootheight() is called only to fill in the IndexOptInfo tree_height field, which is only used by btcostestimate(), so in some sense this is btree-internal data. But making it so that btcostestimate() calls _bt_getrootheight() directly to avoid all that intermediate business seems too complicated, and there was probably a reason that the cost estimation functions don't open the index.
Interestingly, the cost estimation functions for gist and spgist also look at the tree_height field but nothing ever fills it on. So with your API restructuring, someone could provide the missing API functions for those index types. Might be interesting.
That said, there might be value in generalizing this a bit. If you look at the cost estimation functions in pgvector (hnswcostestimate() and ivfflatcostestimate()), they both have this pattern that btcostestimate() tries to avoid: They open the index, look up some number, close the index, then make a cost estimate computation with the number looked up. So another idea would be to generalize the tree_height field to some "index size data" or even "internal data for cost estimation". This wouldn't need to change the API much, since these are all just integer values, but we'd label the functions and fields a bit differently.
Would they be just integers? They could also be void*, with amgetrootheight_function returning data allocated in the current memory context. For btree, that would just be a four byte integer, but other indexes could return whatever they like. If you like that idea, I can code that up for v18, naming the field something like amgetcostestimateinfo_function.
Here is a cleaned-up version of the v17-0004 patch. I have applied the
renaming discussed above. I have also made a wrapper function
btgettreeheight() that calls _bt_getrootheight(). That way, if we ever
want to change the API, we don't have to change _bt_getrootheight(), or
disentangle it then. I've also added documentation and put in a source
code comment that the API could be expanded for additional uses. Also,
I have removed the addition to the IndexOptInfo struct; that didn't seem
necessary.
Attachments:
v17.1-0004-Add-amgettreeheight-index-AM-API-routine.patchtext/plain; charset=UTF-8; name=v17.1-0004-Add-amgettreeheight-index-AM-API-routine.patchDownload+47-7
On Sep 3, 2024, at 9:52 AM, Peter Eisentraut <peter@eisentraut.org> wrote:
Here is a cleaned-up version of the v17-0004 patch. I have applied the renaming discussed above. I have also made a wrapper function btgettreeheight() that calls _bt_getrootheight(). That way, if we ever want to change the API, we don't have to change _bt_getrootheight(), or disentangle it then. I've also added documentation and put in a source code comment that the API could be expanded for additional uses.
Ok.
Also, I have removed the addition to the IndexOptInfo struct; that didn't seem necessary.
Good catch. I agree with the change.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, Aug 22, 2024 at 11:28 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
On Aug 22, 2024, at 1:36 AM, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:
"make installcheck" for treeb is causing issues on my end. I can
investigate further if it’s not a problem for others.The test module index AMs are not intended for use in any installed database, so 'make installcheck' is unnecessary. A mere 'make check' should suffice. However, if you want to run it, you can install the modules, edit postgresql.conf to add 'treeb' to shared_preload_libraries, restart the server, and run 'make installcheck'. This is necessary for 'treeb' because it requests shared memory, and that needs to be done at startup.
Thanks, Mark. This works, and I can run treeb now.
The v18 patch set includes the changes your patches suggest, though I modified the approach a bit. Specifically, rather than standardizing on '1.0.0' for the module versions, as your patches do, I went with '1.0', as is standard in other modules in neighboring directories. The '1.0.0' numbering was something I had been using in versions 1..16 of this patch, and I only partially converted to '1.0' before posting v17, so sorry about that. The v18 patch also has some whitespace fixes.
To address your comments about the noise in the test failures, v18 modifies the clone_tests.pl script to do a little more work translating the expected output to expect the module's AM name ("xash", "xtree", "treeb", or whatnot) beyond what that script did in v17.
Thanks! I see fewer failures now.
There are a few occurrences of the following error when I run "make
check" in the xtree module:
+ERROR: bogus RowCompare index qualification
I needed to specify `amroutine->amcancrosscompare = true;` in xtree.c
to eliminate them, as below:
diff --git a/src/test/modules/xtree/access/xtree.c
b/src/test/modules/xtree/access/xtree.c
index bd472edb04..960966801d 100644
--- a/src/test/modules/xtree/access/xtree.c
+++ b/src/test/modules/xtree/access/xtree.c
@@ -33,6 +33,7 @@ xtree_indexam_handler(PG_FUNCTION_ARGS)
amroutine->amoptsprocnum = BTOPTIONS_PROC;
amroutine->amcanorder = true;
amroutine->amcanhash = false;
+ amroutine->amcancrosscompare = true;
amroutine->amcanorderbyop = false;
amroutine->amcanbackward = true;
amroutine->amcanunique = true;
After adding that, the regression.diffs for the xtree and treeb
modules are identical in content, with only plan diffs remaining. I
think this change should be either part of
v18-0008-Generalize-hash-and-ordering-support-in-amapi.patch or a
separate commit, similar to
v18-0009-Adjust-treeb-to-use-amcanhash-and-amcancrosscomp.patch.
Best,
Alex
On 03.09.24 19:26, Mark Dilger wrote:
On Sep 3, 2024, at 9:52 AM, Peter Eisentraut <peter@eisentraut.org> wrote:
Here is a cleaned-up version of the v17-0004 patch. I have applied the renaming discussed above. I have also made a wrapper function btgettreeheight() that calls _bt_getrootheight(). That way, if we ever want to change the API, we don't have to change _bt_getrootheight(), or disentangle it then. I've also added documentation and put in a source code comment that the API could be expanded for additional uses.
Ok.
Also, I have removed the addition to the IndexOptInfo struct; that didn't seem necessary.
Good catch. I agree with the change.
I have committed this patch. (It needed another small change to silence
a compiler warning about an uninitialized variable.)
Next, I have reviewed patches
v17-0010-Track-sort-direction-in-SortGroupClause.patch
v17-0011-Track-scan-reversals-in-MergeJoin.patch
Both of these seem ok and sensible to me.
They take the concept of the "reverse" flag that already exists in the
affected code and just apply it more consistently throughout the various
code layers, instead of relying on strategy numbers as intermediate
storage. This is both helpful for your ultimate goal in this patch
series, and it also makes the affected code areas simpler and more
consistent and robust.
On Sep 24, 2024, at 10:50 AM, Peter Eisentraut <peter@eisentraut.org> wrote:
Next, I have reviewed patches
v17-0010-Track-sort-direction-in-SortGroupClause.patch
v17-0011-Track-scan-reversals-in-MergeJoin.patchBoth of these seem ok and sensible to me.
They take the concept of the "reverse" flag that already exists in the affected code and just apply it more consistently throughout the various code layers, instead of relying on strategy numbers as intermediate storage. This is both helpful for your ultimate goal in this patch series, and it also makes the affected code areas simpler and more consistent and robust.
Thanks for the review!
Yes, I found the existing use of a btree strategy number rather than a boolean "reverse" flag made using the code from other index AMs needlessly harder. I am glad you see it the same way.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 24.09.24 11:09, Mark Dilger wrote:
On Sep 24, 2024, at 10:50 AM, Peter Eisentraut <peter@eisentraut.org> wrote:
Next, I have reviewed patches
v17-0010-Track-sort-direction-in-SortGroupClause.patch
v17-0011-Track-scan-reversals-in-MergeJoin.patchBoth of these seem ok and sensible to me.
They take the concept of the "reverse" flag that already exists in the affected code and just apply it more consistently throughout the various code layers, instead of relying on strategy numbers as intermediate storage. This is both helpful for your ultimate goal in this patch series, and it also makes the affected code areas simpler and more consistent and robust.
Thanks for the review!
Yes, I found the existing use of a btree strategy number rather than a boolean "reverse" flag made using the code from other index AMs needlessly harder. I am glad you see it the same way.
committed
I want to call out one particular aspect that is central to this patch
series that needs more broader discussion and agreement.
The problem is that btree indexes (and to a lesser extent hash
indexes) are hard-coded in various places. This includes various
places in the optimizer (see patch v17-0018) that might be harder to
understand, but also more easy-to-understand places such as:
- Only hash and btree are supported for replica identity. (see patch
v17-0015)
- Only btree is supported for speculative insertion. (see patch v17-0016)
- Only btree is supported for foreign keys. (see patch v17-0017)
- Only btree can participate in merge join. (see patch v17-0020)
The problem in cases such as these, and some of them actually contain
code comments about this, is that we somehow need to know what the
equality operator (and in some cases ordering operator) for a type is.
And the way this is currently done is that the btree and hash strategy
numbers are hardcoded, and other index AMs are just not supported
because there is no way to find that out from those.
We faced a similar issue for the temporal primary keys feature. One
thing this feature had to overcome is that
- Only btree is supported for primary keys.
For that feature, we wanted to add support for gist indexes, and we
had to again find a way to get the equals operator (and also the
overlaps operator).
The solution we ended up with there is to add a support function to
gist that translates the strategy numbers used by the operator class
to "well-known" strategy numbers that we can understand (commit
7406ab623fe). It later turned out that this problem actually might be
a subset of the general problem being discussed here, so we can also
change that solution if we find a more general solution here.
So the question is, how do we communicate these fundamental semantics
of operators?
The solution we discussed for temporal primary keys is that we just
mandate that certain strategy numbers have fixed meanings. This is
how btree and hash work anyway, but this was never required for gist.
And there are many gist opclass implementations already out there that
don't use compatible strategy numbers, so that solution would not have
worked (at least without some upgrading pain). Hence the idea of
mapping the actual strategy numbers to a fixed set of well-known ones.
For the general case across all index AMs, we could similarly decree
that all index AMs use certain strategy numbers for fixed purposes.
But again, this already wouldn't work for gist and perhaps others
already out there. So here again we need some kind of mapping to
numbers with a fixed purpose. Or perhaps new index AMs can start out
with the "correct" numbers and only some existing ones would need the
mapping.
And then what do you map them to?
One idea would be to map them to the existing btree numbers. This
could work, but I always found it kind of confusing that you'd then
have multiple sets of strategy numbers in play, one native to the
index AM or opclass, and one sort of global one.
Another idea is to map them to another thing altogether, a new enum.
This is what this patch series does, essentially. But it turns out
(so argues the patch series), that this enum already exists, namely
typedef enum RowCompareType
{
/* Values of this enum are chosen to match btree strategy numbers */
ROWCOMPARE_LT = 1, /* BTLessStrategyNumber */
ROWCOMPARE_LE = 2, /* BTLessEqualStrategyNumber */
ROWCOMPARE_EQ = 3, /* BTEqualStrategyNumber */
ROWCOMPARE_GE = 4, /* BTGreaterEqualStrategyNumber */
ROWCOMPARE_GT = 5, /* BTGreaterStrategyNumber */
ROWCOMPARE_NE = 6, /* no such btree strategy */
} RowCompareType;
which in spite of the name, yes, sounds like exactly that kind of
thing. And it's also used in many places for this kind of thing
already.
The patch series adds index AM callbacks
amtranslatestrategy
amtranslaterctype
to convert between strategy number and RowCompareType, and also adds
ROWCOMPARE_NONE
ROWCOMPARE_INVALID
to handle cases where there is no mapping. I suppose for the temporal
PK use, we would need to add something like ROWCOMPARE_OVERLAPS.
So this is the idea. To take a step back, I can see the following
options:
1. Require all index AMs to use btree-compatible strategy numbers.
(Previously rejected, too much upgrading mess.)
2. Provide mapping between index AM strategy numbers and btree strategy
numbers. (This doesn't have a space for non-btree operations like
overlaps.)
3. Provide mapping between index AM strategy numbers and the existing
RT*StrategyNumber numbers. (We can expand the set as we want.)
(This is what the existing mapping function for gist does.)
4. Provide mapping between index AM strategy numbers and some
completely new set of numbers/symbols.
5. Provide mapping between index AM strategy numbers and
RowCompareType (with some small extensions). This is what this
patch does.
Other ideas? Thoughts?
On Oct 30, 2024, at 12:54 PM, Peter Eisentraut <peter@eisentraut.org> wrote:
So this is the idea. To take a step back, I can see the following
options:1. Require all index AMs to use btree-compatible strategy numbers.
(Previously rejected, too much upgrading mess.)2. Provide mapping between index AM strategy numbers and btree strategy
numbers. (This doesn't have a space for non-btree operations like
overlaps.)
I agree that neither of these options are good, for the reasons you mention.
3. Provide mapping between index AM strategy numbers and the existing
RT*StrategyNumber numbers. (We can expand the set as we want.)
(This is what the existing mapping function for gist does.)
The point of such a mapping is that core code outside any index AM can know what an AM is claiming it can do when it advertises that it implements one of these strategy numbers. We don't need any new entries in that mapping until some core feature depends on the semantics of the new entry. Right now, all six of the btree comparators (including not-equal) have semantics that are used outside AMs by functions like match_clause_to_ordering_op(). If any index AM author comes along and creates an index AM which purports to provide these six semantics but actually does something semantically inconsistent with what the backend thinks these mean, that index AM is totally at fault when, for example, ORDER BY returns the wrong results.
On the other hand, if we add RTOverlapStrategyNumber to the common framework of strategy numbers, without anything outside an index AM depending on that, how is an index AM author to know exactly how an "overlaps" operator is supposed to behave? No doubt, brin, gist, spgist, and friends all have their own understanding of what RTOverlapStrategyNumber means, but how is a new index AM supposed to know if it has analogized that concept correctly to its own operator? And if several major versions later, you come along to create some feature, let's say a logical replication feature depending on "overlaps" semantics, how are you to know whether all the index AMs in the wild which advertise they provide an "overlaps" operator will work correctly with your new feature? When logical replication breaks, who is at fault? Perversely, knowing that RTOverlapStrategyNumber is already in the list, you would likely implement the new logical replication feature on some new strategy number, perhaps naming it RTLogicalOverlapStrategyNumber, to avoid such conflicts.
The RT*StrategyNumber list is much too long, containing many such problematic entries. We should not, in my opinion, add these to the list prior to some new feature which depends on them, such as a planner or executor optimization.
4. Provide mapping between index AM strategy numbers and some
completely new set of numbers/symbols.
This is fine, if the new set is sufficiently restricted. However, as mentioned below, the set of sufficiently restricted values is identical to what we currently define as a RowCompareType. It creates needless code churn to throw that away and replace it with a new name.
5. Provide mapping between index AM strategy numbers and
RowCompareType (with some small extensions). This is what this
patch does.
As the patch author, obviously this is the one I chose. The "small extensions" are just to handle "no such value" type cases.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, 31 Oct 2024 at 15:02, Mark Dilger <mark.dilger@enterprisedb.com> wrote:
On Oct 30, 2024, at 12:54 PM, Peter Eisentraut <peter@eisentraut.org> wrote:
So this is the idea. To take a step back, I can see the following
options:1. Require all index AMs to use btree-compatible strategy numbers.
(Previously rejected, too much upgrading mess.)2. Provide mapping between index AM strategy numbers and btree strategy
numbers. (This doesn't have a space for non-btree operations like
overlaps.)I agree that neither of these options are good, for the reasons you mention.
3. Provide mapping between index AM strategy numbers and the existing
RT*StrategyNumber numbers. (We can expand the set as we want.)
(This is what the existing mapping function for gist does.)The point of such a mapping is that core code outside any index AM can know what an AM is claiming it can do when it advertises that it implements one of these strategy numbers. We don't need any new entries in that mapping until some core feature depends on the semantics of the new entry. Right now, all six of the btree comparators (including not-equal) have semantics that are used outside AMs by functions like match_clause_to_ordering_op(). If any index AM author comes along and creates an index AM which purports to provide these six semantics but actually does something semantically inconsistent with what the backend thinks these mean, that index AM is totally at fault when, for example, ORDER BY returns the wrong results.
On the other hand, if we add RTOverlapStrategyNumber to the common framework of strategy numbers, without anything outside an index AM depending on that, how is an index AM author to know exactly how an "overlaps" operator is supposed to behave? No doubt, brin, gist, spgist, and friends all have their own understanding of what RTOverlapStrategyNumber means, but how is a new index AM supposed to know if it has analogized that concept correctly to its own operator? And if several major versions later, you come along to create some feature, let's say a logical replication feature depending on "overlaps" semantics, how are you to know whether all the index AMs in the wild which advertise they provide an "overlaps" operator will work correctly with your new feature? When logical replication breaks, who is at fault? Perversely, knowing that RTOverlapStrategyNumber is already in the list, you would likely implement the new logical replication feature on some new strategy number, perhaps naming it RTLogicalOverlapStrategyNumber, to avoid such conflicts.
The RT*StrategyNumber list is much too long, containing many such problematic entries. We should not, in my opinion, add these to the list prior to some new feature which depends on them, such as a planner or executor optimization.
4. Provide mapping between index AM strategy numbers and some
completely new set of numbers/symbols.This is fine, if the new set is sufficiently restricted. However, as mentioned below, the set of sufficiently restricted values is identical to what we currently define as a RowCompareType. It creates needless code churn to throw that away and replace it with a new name.
5. Provide mapping between index AM strategy numbers and
RowCompareType (with some small extensions). This is what this
patch does.As the patch author, obviously this is the one I chose. The "small extensions" are just to handle "no such value" type cases.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi! Can we please have a rebased version of this patch series?
--
Best regards,
Kirill Reshke