Index AM API cleanup

Started by Mark Dilgeralmost 2 years ago48 messageshackers

mark.dilger@enterprisedb.com

almost 2 years ago

Hackers,

The index access method API mostly encapsulates the functionality of in-core index types, with some lingering oversights and layering violations. There has been an ongoing discussion for several release cycles concerning how the API might be improved to allow interesting additional functionality. That discussion has frequently included patch proposals to support peculiar needs of esoteric bespoke access methods, which have little interest for the rest of the community.

For your consideration, here is a patch series that takes a different approach. It addresses many of the limitations and layering violations, along with introducing test infrastructure to validate the changes. Nothing in this series is intended to introduce new functionality to the API. Any such, "wouldn't it be great if..." type suggestions for the API are out of scope for this work. On the other hand, this patch set does not purport to fix all such problems; it merely moves the project in that direction.

For validation purposes, the first patch creates shallow copies of hash and btree named "xash" and "xtree" and introduces some infrastructure to run the src/test/regress and src/test/isolation tests against them without needing to duplicate those tests. Numerous failures like "unexpected non-btree AM" can be observed in the test results.

Also for validation purposes, the second patch creates a deep copy of btree named "treeb" which uses modified copies of the btree implementation rather than using the btree implementation by reference. This allows for more experimentation, but might be more code than the community wants. Since this is broken into its own patch, it can be excluded from what eventually gets committed. Even if we knew a priori that this "treeb" test would surely never be committed, it still serves to help anybody reviewing the patch series to experiment with those other changes without having to construct such a test index AM individually.

The next twenty patches are a mix of fixes of various layering violations, such as not allowing non-core index AMs from use in replica identity full, or for speculative insertion, or for foreign key constraints, or as part of merge join; with updates to the "treeb" code as needed. The changes to "treeb" are broken out so that they can also easily be excluded from whatever gets committed.

The final commit changes the ordering of the strategy numbers in treeb. The name "treeb" is a rotation of "btree", and indeed the strategy numbers 1,2,3,4,5 are rotated to 5,1,2,3,4. The fact that treeb indexes work properly after this change is meant to demonstrate that the core changes have been sufficient to address the prior dependency on btree strategy number ordering. Again, this doesn't need to be committed; it might only serve to help reviewers in determining if the functional changes are correct.

Not to harp on this too heavily, but please note that running the core regression and isolation tests against xash, xtree, and treeb are known not to pass. That's the point. But by the end of the patch series, the failures are limited to EXPLAIN output changes; the query results themselves are intended to be consistent with the expected test output. To avoid breaking `make check-world`, these test modules are not added to the test schedule. They are also, at least for now, only useable from make, not from meson.

Internal development versions 1..16 not included. Andrew, Peter, and Alex have all provided reviews internally and are cc'd here. Patch by me. Here is v17 for the community:

Kirill Reshke

reshkekirill@gmail.com

almost 2 years ago

In reply to: Mark Dilger (#1)

Re: Index AM API cleanup

On Thu, 22 Aug 2024 at 00:25, Mark Dilger <mark.dilger@enterprisedb.com> wrote:

Hackers,

The index access method API mostly encapsulates the functionality of in-core index types, with some lingering oversights and layering violations. There has been an ongoing discussion for several release cycles concerning how the API might be improved to allow interesting additional functionality. That discussion has frequently included patch proposals to support peculiar needs of esoteric bespoke access methods, which have little interest for the rest of the community.

For your consideration, here is a patch series that takes a different approach. It addresses many of the limitations and layering violations, along with introducing test infrastructure to validate the changes. Nothing in this series is intended to introduce new functionality to the API. Any such, "wouldn't it be great if..." type suggestions for the API are out of scope for this work. On the other hand, this patch set does not purport to fix all such problems; it merely moves the project in that direction.

For validation purposes, the first patch creates shallow copies of hash and btree named "xash" and "xtree" and introduces some infrastructure to run the src/test/regress and src/test/isolation tests against them without needing to duplicate those tests. Numerous failures like "unexpected non-btree AM" can be observed in the test results.

Also for validation purposes, the second patch creates a deep copy of btree named "treeb" which uses modified copies of the btree implementation rather than using the btree implementation by reference. This allows for more experimentation, but might be more code than the community wants. Since this is broken into its own patch, it can be excluded from what eventually gets committed. Even if we knew a priori that this "treeb" test would surely never be committed, it still serves to help anybody reviewing the patch series to experiment with those other changes without having to construct such a test index AM individually.

The next twenty patches are a mix of fixes of various layering violations, such as not allowing non-core index AMs from use in replica identity full, or for speculative insertion, or for foreign key constraints, or as part of merge join; with updates to the "treeb" code as needed. The changes to "treeb" are broken out so that they can also easily be excluded from whatever gets committed.

The final commit changes the ordering of the strategy numbers in treeb. The name "treeb" is a rotation of "btree", and indeed the strategy numbers 1,2,3,4,5 are rotated to 5,1,2,3,4. The fact that treeb indexes work properly after this change is meant to demonstrate that the core changes have been sufficient to address the prior dependency on btree strategy number ordering. Again, this doesn't need to be committed; it might only serve to help reviewers in determining if the functional changes are correct.

Not to harp on this too heavily, but please note that running the core regression and isolation tests against xash, xtree, and treeb are known not to pass. That's the point. But by the end of the patch series, the failures are limited to EXPLAIN output changes; the query results themselves are intended to be consistent with the expected test output. To avoid breaking `make check-world`, these test modules are not added to the test schedule. They are also, at least for now, only useable from make, not from meson.

Internal development versions 1..16 not included. Andrew, Peter, and Alex have all provided reviews internally and are cc'd here. Patch by me. Here is v17 for the community:

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Hi! Why is the patch attached as .tar.bz2? Usually raw patches are sent here...

--
Best regards,
Kirill Reshke

Mark Dilger

mark.dilger@enterprisedb.com

almost 2 years ago

In reply to: Kirill Reshke (#2)

Re: Index AM API cleanup

On Aug 21, 2024, at 12:34 PM, Kirill Reshke <reshkekirill@gmail.com> wrote:

Hi! Why is the patch attached as .tar.bz2? Usually raw patches are sent here...

I worried the patch set, being greater than 1 MB, might bounce or be held up in moderation.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Andrew Dunstan

andrew@dunslane.net

almost 2 years ago

In reply to: Mark Dilger (#3)

Re: Index AM API cleanup

On 2024-08-21 We 4:09 PM, Mark Dilger wrote:

On Aug 21, 2024, at 12:34 PM, Kirill Reshke<reshkekirill@gmail.com> wrote:

Hi! Why is the patch attached as .tar.bz2? Usually raw patches are sent here...

I worried the patch set, being greater than 1 MB, might bounce or be held up in moderation.

Yes, it would have required moderation AIUI. It is not at all
unprecedented to send a compressed tar of patches, and is explicitly
provided for by the cfbot: see
<https://wiki.postgresql.org/wiki/Cfbot#Which_attachments_are_considered_to_be_patches.3F>

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Mark Dilger (#3)

Re: Index AM API cleanup

Mark Dilger <mark.dilger@enterprisedb.com> writes:

On Aug 21, 2024, at 12:34 PM, Kirill Reshke <reshkekirill@gmail.com> wrote:
Hi! Why is the patch attached as .tar.bz2? Usually raw patches are sent here...

I worried the patch set, being greater than 1 MB, might bounce or be held up in moderation.

I'm +1 for doing it like this with such a large group of patches.
Separate attachments are nice up to say half a dozen attachments,
but beyond that they're kind of a pain to deal with.

regards, tom lane

Alexandra Wang

alexandra.wang.oss@gmail.com

almost 2 years ago

In reply to: Mark Dilger (#1)

Re: Index AM API cleanup

Hi Mark,

On Wed, Aug 21, 2024 at 2:25 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

For validation purposes, the first patch creates shallow copies of hash and btree named "xash" and "xtree" and introduces some infrastructure to run the src/test/regress and src/test/isolation tests against them without needing to duplicate those tests. Numerous failures like "unexpected non-btree AM" can be observed in the test results.

Also for validation purposes, the second patch creates a deep copy of btree named "treeb" which uses modified copies of the btree implementation rather than using the btree implementation by reference. This allows for more experimentation, but might be more code than the community wants. Since this is broken into its own patch, it can be excluded from what eventually gets committed. Even if we knew a priori that this "treeb" test would surely never be committed, it still serves to help anybody reviewing the patch series to experiment with those other changes without having to construct such a test index AM individually.

Thank you for providing an infrastructure that allows modules to test
by reference and re-use existing regression and isolation tests. I
believe this approach ensures great coverage for the API cleanup.
I was very excited to compare the “make && make check” results in the
test modules - ‘xash,’ ‘xtree,’ and ‘treeb’ - before and after the
series of AM API fixes. Here are my results:

Before the fixes:
- xash: # 20 of 223 tests failed.
- xtree: # 95 of 223 tests failed.
- treeb: # 47 of 223 tests failed.

After the fixes:
- xash: # 21 of 223 tests failed.
- xtree: # 58 of 223 tests failed.
- treeb: # 58 of 223 tests failed.

I expected the series of fixes to eliminate all failed tests, but that
wasn’t the case. It's nice to see the failures for ‘xtree’ have significantly
decreased as the ‘unexpected non-btree AM’ errors have been resolved.
I noticed some of the remaining test failures are due to trivial index
name changes, like hash -> xash and btree -> treeb.

If we keep xtree and xash for testing, is there a way to ensure all
tests pass instead of excluding them from "make check-world"? On that
note, I ran ‘make check-world’ on the final patch, and everything
looks good.

I had to make some changes to the first two patches in order to run
"make check" and compile the treeb code on my machine. I’ve attached
my changes.

"make installcheck" for treeb is causing issues on my end. I can
investigate further if it’s not a problem for others.

Best,
Alex

Mark Dilger

mark.dilger@enterprisedb.com

almost 2 years ago

In reply to: Alexandra Wang (#6)

Re: Index AM API cleanup

On Aug 22, 2024, at 1:36 AM, Alexandra Wang <alexandra.wang.oss@gmail.com> wrote:

I had to make some changes to the first two patches in order to run
"make check" and compile the treeb code on my machine. I’ve attached
my changes.

Thank you for the review, and the patches!

"make installcheck" for treeb is causing issues on my end. I can
investigate further if it’s not a problem for others.

The test module index AMs are not intended for use in any installed database, so 'make installcheck' is unnecessary. A mere 'make check' should suffice. However, if you want to run it, you can install the modules, edit postgresql.conf to add 'treeb' to shared_preload_libraries, restart the server, and run 'make installcheck'. This is necessary for 'treeb' because it requests shared memory, and that needs to be done at startup.

The v18 patch set includes the changes your patches suggest, though I modified the approach a bit. Specifically, rather than standardizing on '1.0.0' for the module versions, as your patches do, I went with '1.0', as is standard in other modules in neighboring directories. The '1.0.0' numbering was something I had been using in versions 1..16 of this patch, and I only partially converted to '1.0' before posting v17, so sorry about that. The v18 patch also has some whitespace fixes.

To address your comments about the noise in the test failures, v18 modifies the clone_tests.pl script to do a little more work translating the expected output to expect the module's AM name ("xash", "xtree", "treeb", or whatnot) beyond what that script did in v17.

Index AM API cleanup

Attachments:

Attachments:

Attachments:

Attachments: