amcheck support for BRIN indexes

Started by Arseniy Mukhin12 months ago22 messageshackers
Jump to latest
#1Arseniy Mukhin
arseniy.mukhin.dev@gmail.com

Hi,

It's not for v18, just wanted to share with the community and register
it in the upcoming Commitfest 2025-07.

Here is the patch with amcheck support for BRIN indexes.

Patch uses amcheck common infrastructure that was introduced in [1]/messages/by-id/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1@yandex-team.ru.
It works and I deleted all the code that
I copied from btree check at first. Great.

During the check we hold ShareUpdateExclusiveLock, so we don't block
regular reads/inserts/updates
and the same time range summarizations/desummarizations are impossible
during the check which simplifies check logic.
While checking we do ereport(ERROR) on the first issue, the same way
as btree, gin checks do.

There are two parts:

First part is 'index structure check':
1) Meta page check
2) Revmap check. Walk revmap and check every valid revmap item points
to the index tuple with the expected range blkno,
and index tuple is consistent with the tuple description. Also it's
not documented, but known from the brin code that
for every empty range we should have allnulls = true, hasnulls =
false. So this is also checked here.
3) Regular pages check. Walk regular pages and check that every index
tuple has a corresponding revmap item that points to it.
We don't check index tuple structures here, because it was already
done in 2 steps.

Regular pages check is optional. Orphan index tuple errors in this
check doesn't necessary mean that index is corrupted,
but AFAIS brin is not supposed to have such orphan index tuples now,
so if we encounter one than probably something
wrong with the index.

Second part is 'all consistent check':
We check all heap tuples are consistent with the index. It's the most
expensive part and it's optional.
Here we call consistent functions for every heap tuple. Also here we
check that fields like 'has_nulls', 'all_nulls',
'empty_range' are consistent with what we see in the heap.

There are two patch files:

0001-brin-refactoring.patch

It's just two tiny changes in the brin code.
1) For index tuple structure check we need to know how on-disk index
tuples look like.
Function that lets you get it 'brtuple_disk_tupdesc' is internal. This
patch makes it extern.
2) Create macros for BRIN_MAX_PAGES_PER_RANGE. For the meta page check.

0002-amechek-brin-support.patch

It's check functionality itself + regression tests + amcheck extension updates.

Some open questions:

1) How to get the correct strategy number for the scan key in "all
heap consistent" check. The consistent function wants
a scan key, and to generate it we have to pick a strategy number. We
can't just use the same strategy number for all
indexes because its meaning differs from opclass to opclass (for
example equal strategy number is 1 in Bloom
and 3 in min_max). We also can't pick an operator and use it for every
check, because opclasses don't have any requirements
about what operators they should support. The solution was to let user
to define check operator
(parameter consistent_operator_names). It's an array as we can have a
multicolumn index. We can use '=' as default check
operator, because AFAIS '=' operator is supported by all core
opclasses except 'box_inclusion_ops', and IMHO it's the
most obvious candidate for such a check. So if param
'consistent_operator_names' is an empty array (param default value),
then for all attributes we use operator '='. In most cases operator
'=' does the job and you don't need to worry about
consistent_operator_names param. Not sure about it, what do you think?

2) The current implementation of "all heap consistent" uses the scan
of the entire heap. So if we have a lot of
unsummarized ranges, we do a lot of wasted work because we can't use
the tuples that belong to the unsummarized ranges.
Instead of having one scan for the entire heap, we can walk the
revmap, take only the summarized ranges, and
scan only the pages that belong to those ranges. So we have one scan
per range. This approach helps us avoid touching
those heap tuples that we can't use for index checks. But I'm not sure
if we should to worry about that because every
autovacuum summarizes all the unsummarized ranges, so don't expect a
lot of unsummarized ranges on average.

TODO list:

1) add TAP tests
2) update SGML docs for amcheck (think it's better to do after patch
is reviewed and more or less finalized)
3) pg_amcheck integration

Big thanks to Tomas Vondra for the first patch idea and initial review.

[1]: /messages/by-id/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1@yandex-team.ru

Best regards,
Arseniy Mukhin

Attachments:

0002-amcheck-brin-support.patchtext/x-patch; charset=US-ASCII; name=0002-amcheck-brin-support.patchDownload+1533-4
0001-brin-refactoring.patchtext/x-patch; charset=US-ASCII; name=0001-brin-refactoring.patchDownload+6-3
#2Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Arseniy Mukhin (#1)
Re: amcheck support for BRIN indexes

Hi,

Here is a new version.

TAP tests were added. Tried to reproduce more or less every violation.
I don't include 2 tests where disk representation ItemIdData needs to
be changed: it works locally, but I don't think these tests are
portable. While writing tests some minor issues were found and fixed.
Also ci compiler warnings were fixed.

Best regards,
Arseniy Mukhin

Attachments:

v2-0001-brin-refactoring.patchapplication/x-patch; name=v2-0001-brin-refactoring.patchDownload+6-3
v2-0002-amcheck-brin-support.patchapplication/x-patch; name=v2-0002-amcheck-brin-support.patchDownload+1874-4
#3Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Arseniy Mukhin (#2)
Re: amcheck support for BRIN indexes

On 6/8/25 14:39, Arseniy Mukhin wrote:

Hi,

Here is a new version.

TAP tests were added. Tried to reproduce more or less every violation.
I don't include 2 tests where disk representation ItemIdData needs to
be changed: it works locally, but I don't think these tests are
portable. While writing tests some minor issues were found and fixed.
Also ci compiler warnings were fixed.

Thanks. I've added myself as a reviewer, so that I don't forget about
this for the next CF.

regards

--
Tomas Vondra

#4Andrey Borodin
amborodin@acm.org
In reply to: Arseniy Mukhin (#2)
Re: amcheck support for BRIN indexes

Hi!

Nice feature!

On 8 Jun 2025, at 17:39, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

<v2-0001-brin-refactoring.patch>

+#define BRIN_MAX_PAGES_PER_RANGE 131072

I'd personally prefer some words on where does this limit come from. I'm not a BRIN pro, just curious.
Or, at least, maybe we can use a form 128 * 1024, if it's just a sane limit.

On 8 Jun 2025, at 17:39, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:
<v2-0002-amcheck-brin-support.patch>

A bit more detailed commit message would be very useful.

The test coverage is very decent. The number of possible corruptions in tests is impressive. I don't really have an experience with BRIN to say how different they are, but I want to ask if you are sure that these types of corruption will be portable across big-endian machines and such stuff.

Copyright year in all new files should be 2025.

Some documentation about brin_index_check() would be handy, especially about its parameters. Perhaps, somewhere near gin_index_check() in amcheck.sgml.

brin_check_ereport() reports ERRCODE_DATA_CORRUPTED. We should distinguish between ERRCODE_INDEX_CORRUPTED which is a stormbringer and ERRCODE_DATA_CORRUPTED which is an actual disaster.

If it's not very difficult - it would be great to use read_stream infrastructure. See btvacuumscan() in nbtree.c calling read_stream_begin_relation() for example. We cannot use it in logical scans in B-tree\GiST\GIN, but maybe BRIN can take some advantage of this new shiny technology.

+		state->revmapbuf = ReadBufferExtended(idxrel, MAIN_FORKNUM, state->revmapBlk, RBM_NORMAL,
+											  state->checkstrategy);
+		LockBuffer(state->revmapbuf, BUFFER_LOCK_SHARE);
// usage of state->revmapbuf
+		LockBuffer(state->revmapbuf, BUFFER_LOCK_UNLOCK);
// more usage of state->revmapbuf
+		ReleaseBuffer(state->revmapbuf);

I hope you know what you are doing. BRIN concurrency is not known to me at all.

That's all for first pass through patches. Thanks for working on it!

Best regards, Andrey Borodin.

#5Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Andrey Borodin (#4)
Re: amcheck support for BRIN indexes

On Mon, Jun 9, 2025 at 1:16 AM Tomas Vondra <tomas@vondra.me> wrote:

On 6/8/25 14:39, Arseniy Mukhin wrote:

Hi,

Here is a new version.

TAP tests were added. Tried to reproduce more or less every violation.
I don't include 2 tests where disk representation ItemIdData needs to
be changed: it works locally, but I don't think these tests are
portable. While writing tests some minor issues were found and fixed.
Also ci compiler warnings were fixed.

Thanks. I've added myself as a reviewer, so that I don't forget about
this for the next CF.

Thank you, looking forward to hearing your thoughts.

On Mon, Jun 16, 2025 at 8:11 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

Hi!

Nice feature!

Hi Andrey, thank you for your interest in the patch!

On 8 Jun 2025, at 17:39, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

<v2-0001-brin-refactoring.patch>

+#define BRIN_MAX_PAGES_PER_RANGE 131072

I'd personally prefer some words on where does this limit come from. I'm not a BRIN pro, just curious.
Or, at least, maybe we can use a form 128 * 1024, if it's just a sane limit.

Actually I don't know where this value came from, this limit already
exists in reloptions.c, so patch just creates macros so that it can be
reused in the check without duplication.

On 8 Jun 2025, at 17:39, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:
<v2-0002-amcheck-brin-support.patch>

A bit more detailed commit message would be very useful.

Agree, it was too concise. Added commit messages.

The test coverage is very decent. The number of possible corruptions in tests is impressive. I don't really have an experience with BRIN to say how different they are, but I want to ask if you are sure that these types of corruption will be portable across big-endian machines and such stuff.

Yeah, it seems that there are a lot of things that can go wrong with
the test's portability. What I think can be done to make it more
robust:
- using regular expressions to find places we want to corrupt instead
of concrete offsets. This way we avoid problems with different
alignments for 32 bit and 64 bit systems.
- using perl pack() function, so it uses the endianness of the system
where you run tests. This helps to avoid problems with different
endianness.
- don't touch things like varlena len that have different values on
different machines.
And if some test turned out to be not portable we can drop it, but at
least we would know that the code works, so it also would not be a
useless effort.

Copyright year in all new files should be 2025.

Fixed.

Some documentation about brin_index_check() would be handy, especially about its parameters. Perhaps, somewhere near gin_index_check() in amcheck.sgml.

Thanks, I'm gonna do it soon.

brin_check_ereport() reports ERRCODE_DATA_CORRUPTED. We should distinguish between ERRCODE_INDEX_CORRUPTED which is a stormbringer and ERRCODE_DATA_CORRUPTED which is an actual disaster.

Fixed. Interesting, I used btree check as reference when started
writing brin check, and in btree check there 53
ERRCODE_INDEX_CORRUPTED ereports and only 1 ERRCODE_DATA_CORRUPTED
ereport. So it was very hard to do, but I managed to pick the wrong
one. I wonder if this btree check ereport should also be changed to
ERRCODE_INDEX_CORRUPTED?

If it's not very difficult - it would be great to use read_stream infrastructure. See btvacuumscan() in nbtree.c calling read_stream_begin_relation() for example. We cannot use it in logical scans in B-tree\GiST\GIN, but maybe BRIN can take some advantage of this new shiny technology.

Thanks, I will look into it.

+               state->revmapbuf = ReadBufferExtended(idxrel, MAIN_FORKNUM, state->revmapBlk, RBM_NORMAL,
+                                                                                         state->checkstrategy);
+               LockBuffer(state->revmapbuf, BUFFER_LOCK_SHARE);
// usage of state->revmapbuf
+               LockBuffer(state->revmapbuf, BUFFER_LOCK_UNLOCK);
// more usage of state->revmapbuf
+               ReleaseBuffer(state->revmapbuf);

I hope you know what you are doing. BRIN concurrency is not known to me at all.

It seems alright, here we lock the revmap page again inside
check_revmap_item() for every revmap item.

That's all for first pass through patches. Thanks for working on it!

Thank you for the review!

Here is a new version of the patch.
It's rebased and It fixes points from Andrey's review. Two tests fail
on the 32 bit build, a new version should fix it. Also the
'all_heap_consistent' part was moved to a separate patch, it's about
400 lines of code, so I hope it is more reviewable now.
I tried the patch with postgis extension brin op classes and found a
small bug, the new version fixes it too.

Best regards,
Arseniy Mukhin

Attachments:

v3-0001-brin-refactoring.patchtext/x-patch; charset=US-ASCII; name=v3-0001-brin-refactoring.patchDownload+6-3
v3-0002-amcheck-brin_index_check-index-structure-check.patchtext/x-patch; charset=US-ASCII; name=v3-0002-amcheck-brin_index_check-index-structure-check.patchDownload+1360-9
v3-0003-amcheck-brin_index_check-all-heap-consistent.patchtext/x-patch; charset=US-ASCII; name=v3-0003-amcheck-brin_index_check-all-heap-consistent.patchDownload+554-22
#6Andrey Borodin
amborodin@acm.org
In reply to: Arseniy Mukhin (#5)
Re: amcheck support for BRIN indexes

On 18 Jun 2025, at 11:33, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

Interesting, I used btree check as reference when started
writing brin check, and in btree check there 53
ERRCODE_INDEX_CORRUPTED ereports and only 1 ERRCODE_DATA_CORRUPTED
ereport. So it was very hard to do, but I managed to pick the wrong
one. I wonder if this btree check ereport should also be changed to
ERRCODE_INDEX_CORRUPTED?

It's there in a case of heapallindexes failure. I concur that ERRCODE_INDEX_CORRUPTED is more appropriate in that case in verify_nbtree.c.
But I recollect Peter explained this code before somewhere in pgsql-hackers. And the reasoning was something like "if you lack a tuple in unquie constraints - it's almost certainly subsequent constrain violation and data loss". But I'm not sure.
And I could not find this discussion in archives.

Best regards, Andrey Borodin.

#7Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Andrey Borodin (#6)
Re: amcheck support for BRIN indexes

On Wed, Jun 18, 2025 at 2:39 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 18 Jun 2025, at 11:33, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:

Interesting, I used btree check as reference when started
writing brin check, and in btree check there 53
ERRCODE_INDEX_CORRUPTED ereports and only 1 ERRCODE_DATA_CORRUPTED
ereport. So it was very hard to do, but I managed to pick the wrong
one. I wonder if this btree check ereport should also be changed to
ERRCODE_INDEX_CORRUPTED?

It's there in a case of heapallindexes failure. I concur that ERRCODE_INDEX_CORRUPTED is more appropriate in that case in verify_nbtree.c.
But I recollect Peter explained this code before somewhere in pgsql-hackers. And the reasoning was something like "if you lack a tuple in unquie constraints - it's almost certainly subsequent constrain violation and data loss". But I'm not sure.
And I could not find this discussion in archives.

There is a thread about heapallindexed feature [1]/messages/by-id/CAH2-WzmVKiwcNrhYFH9CTLLcmQTMH_xjW=AvxfDKAftmY47QKw@mail.gmail.com, I guess this is a
one you mentioned? Also it turned out that this error code is
explained in the code comment:

* Since the overall structure of the index has already been verified, the most
* likely explanation for error here is a corrupt heap page (could be logical
* or physical corruption). ...

Now I wonder if we need to use ERRCODE_DATA_CORRUPTED in the 'heap all
consistent' part? It's similar to btree heapallindexed check. We have
a structurally consistent index, but for some reason it is not
consistent with the heap. It seems to me it's impossible to say who we
should blame here. I leave ERRCODE_INDEX_CORRUPTED for now.

I noticed that fixes about year and error codes didn't get to the last
version for some reason, so there is a new version with fixes. Also I
changed the 'heap all consistent' error message "Index %s is
corrupted" -> "Index %s is not consistent with the heap". New message
looks less misleading as we don't know where the problem is. Thanks!

[1]: /messages/by-id/CAH2-WzmVKiwcNrhYFH9CTLLcmQTMH_xjW=AvxfDKAftmY47QKw@mail.gmail.com

Best regards,
Arseniy Mukhin

Attachments:

v4-0002-amcheck-brin_index_check-index-structure-check.patchtext/x-patch; charset=US-ASCII; name=v4-0002-amcheck-brin_index_check-index-structure-check.patchDownload+1360-9
v4-0003-amcheck-brin_index_check-heap-all-consistent.patchtext/x-patch; charset=US-ASCII; name=v4-0003-amcheck-brin_index_check-heap-all-consistent.patchDownload+554-22
v4-0001-brin-refactoring.patchtext/x-patch; charset=US-ASCII; name=v4-0001-brin-refactoring.patchDownload+6-3
#8Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Arseniy Mukhin (#1)
Re: amcheck support for BRIN indexes

Hi,

I would like share some thoughts about 'heap all consistent' part and
one of the open questions:

The idea behind 'heap all consistent' is to use heap data to validate
the index. BRIN doesn't store heap tuples, so there is no
straightforward way to check if every tuple was indexed or not. We
have range data, so we need to do something with every heap tuple and
corresponding range. Something that will tell us if the range data
covers the heap tuple or not. What options I see here:

1) We can use the addValue() function. It returns FALSE if range data
was not changed (in other words range data already covers heap tuple
data that we pass to the function). It's very easy to do, we can use
heap tuples directly. But the problem I see here is that addValue()
can return TRUE even if heap tuple data have been already covered by
the range, but range data itself changed for some reason (some new
algorithm were applied for instance). So I think we can have some
false positives that we can do nothing about.

2) We can use the consistent() function. It requires ScanKey and
returns true if the range data satisfies ScanKey's condition. So we
need to convert every heap tuple into ScanKey somehow. This approach
is implemented now in the patch, so I tried to describe all details
about heap tuple to ScanKey conversion in the comment:

/*
* Prepare ScanKey for index attribute.
*
* ConsistentFn requires ScanKey, so we need to generate ScanKey for every
* attribute somehow. We want ScanKey that would result in TRUE for every heap
* tuple within the range when we use its indexed value as sk_argument.
* To generate such a ScanKey we need to define the right operand type
and the strategy number.
* Right operand type is a type of data that index is built on, so
it's 'opcintype'.
* There is no strategy number that we can always use,
* because every opclass defines its own set of operators it supports
and strategy number
* for the same operator can differ from opclass to opclass.
* So to get strategy number we look up an operator that gives us
desired behavior
* and which both operand types are 'opcintype' and then retrieve the
strategy number for it.
* Most of the time we can use '='. We let user define operator name
in case opclass doesn't
* support '=' operator. Also, if such operator doesn't exist, we
can't proceed with the check.
*
* Generated once, and will be reused for all heap tuples.
* Argument field will be filled for every heap tuple before
* consistent function invocation, so leave it NULL for a while.
*
*/

With this approach function brin_check() has optional parameter
'consistent_operator_names' that it seems to me could be very
confusing for users. In general I think this is the most complex
approach both in terms of use and implementation.

3) The approach that seems to me the most clear and straightforward:
to add new optional function to BRIN opclass API. The function that
would say if passed value is covered with the current range data. it's
exactly what we want to know, so we can use heap data directly here.
Something like that:

bool withinRange(BrinDesc *bdesc, BrinValues *column, Datum val, bool isnull)

It could be an optional function that will be implemented for all core
BRIN opclasses. So if somebody wants to use it for some custom opclass
they will need to implement it too, but it's not required. I
understand that adding something to the index opclass API requires
very strong arguments. So the argument here is that it will let to do
brin check very robust (without possible false positives as in the
first approach) and easy to use (no additional parameters in the check
function). Also, the withinRange() function could be written in such a
way that it would be more efficient for our task than addValue() or
consistent().

I'd be glad to hear your thoughts!

Best regards,
Arseniy Mukhin

#9Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Arseniy Mukhin (#8)
Re: amcheck support for BRIN indexes

Hi!

On Wed, Jun 18, 2025 at 11:33 AM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

...
On Mon, Jun 16, 2025 at 8:11 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
...

If it's not very difficult - it would be great to use read_stream infrastructure. See btvacuumscan() in nbtree.c calling read_stream_begin_relation() for example. We cannot use it in logical scans in B-tree\GiST\GIN, but maybe BRIN can take some advantage of this new shiny technology.

Thanks, I will look into it.

You are right, it was not very difficult to replace index sequential
scans with read_streams. Hope I picked correct stream_flag values.
Thank you!

On Sun, Jun 22, 2025 at 12:55 AM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

...
1) We can use the addValue() function. It returns FALSE if range data
was not changed (in other words range data already covers heap tuple
data that we pass to the function). It's very easy to do, we can use
heap tuples directly. But the problem I see here is that addValue()
can return TRUE even if heap tuple data have been already covered by
the range, but range data itself changed for some reason (some new
algorithm were applied for instance). So I think we can have some
false positives that we can do nothing about.

And yes, it's not an option really. It turned out that minmax_multi
can return true from addValue() even if it already contains the value.
So we can drop this option.

...
3) The approach that seems to me the most clear and straightforward:
to add new optional function to BRIN opclass API. The function that
would say if passed value is covered with the current range data. it's
exactly what we want to know, so we can use heap data directly here.
Something like that:

bool withinRange(BrinDesc *bdesc, BrinValues *column, Datum val, bool isnull)

It could be an optional function that will be implemented for all core
BRIN opclasses. So if somebody wants to use it for some custom opclass
they will need to implement it too, but it's not required. I
understand that adding something to the index opclass API requires
very strong arguments. So the argument here is that it will let to do
brin check very robust (without possible false positives as in the
first approach) and easy to use (no additional parameters in the check
function). Also, the withinRange() function could be written in such a
way that it would be more efficient for our task than addValue() or
consistent().

I decided to give it a try and implement such a support function. It
was not very difficult since all necessary logic already exists in
addValue() and consistent() functions for all core operator classes.
The main doubt about this approach: we add something to the core just
to use it in the contrib module. But the logic of this method is very
common with what we already have there and probably it is not possible
to implement it outside of the core, because you need all opclass
internals etc.

So there is a new version. I renamed 'heap all consistent' -> 'heap
all indexed', as btree amcheck calls it. I think there is not much
point in using another name here. There are two new files:
0004 - adds new BRIN support function (withinRange).
0005 - migrate 'heap all indexed' from using consistent function to
new withinRange function.

Patch 0003 still has the old 'heap all indexed' implementation that
uses a consistent function (2nd approach). So If you want to have
'heap all indexed' using a consistent function - don't apply 0004 and
0005 patches.

Best regards,
Arseniy Mukhin

Attachments:

v5-0002-amcheck-brin_index_check-index-structure-check.patchtext/x-patch; charset=US-ASCII; name=v5-0002-amcheck-brin_index_check-index-structure-check.patchDownload+1408-9
v5-0005-using-withinRange-function-for-heap-all-indexed-c.patchtext/x-patch; charset=US-ASCII; name=v5-0005-using-withinRange-function-for-heap-all-indexed-c.patchDownload+57-249
v5-0003-amcheck-brin_index_check-heap-all-indexed.patchtext/x-patch; charset=US-ASCII; name=v5-0003-amcheck-brin_index_check-heap-all-indexed.patchDownload+563-32
v5-0001-brin-refactoring.patchtext/x-patch; charset=US-ASCII; name=v5-0001-brin-refactoring.patchDownload+6-3
v5-0004-Adds-new-BRIN-support-function-withinRange.patchtext/x-patch; charset=US-ASCII; name=v5-0004-Adds-new-BRIN-support-function-withinRange.patchDownload+509-42
#10Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Arseniy Mukhin (#9)
Re: amcheck support for BRIN indexes

Sorry, forget to run a full test run with the new patch version. Some
tests were unhappy with the new unknown support function. Here the new
version with the fix.

Best regards,
Arseniy Mukhin

Attachments:

v6-0001-brin-refactoring.patchtext/x-patch; charset=US-ASCII; name=v6-0001-brin-refactoring.patchDownload+6-3
v6-0003-amcheck-brin_index_check-heap-all-indexed.patchtext/x-patch; charset=US-ASCII; name=v6-0003-amcheck-brin_index_check-heap-all-indexed.patchDownload+563-32
v6-0002-amcheck-brin_index_check-index-structure-check.patchtext/x-patch; charset=US-ASCII; name=v6-0002-amcheck-brin_index_check-index-structure-check.patchDownload+1408-9
v6-0005-using-withinRange-function-for-heap-all-indexed-c.patchtext/x-patch; charset=US-ASCII; name=v6-0005-using-withinRange-function-for-heap-all-indexed-c.patchDownload+57-249
v6-0004-Adds-new-BRIN-support-function-withinRange.patchtext/x-patch; charset=US-ASCII; name=v6-0004-Adds-new-BRIN-support-function-withinRange.patchDownload+510-42
#11Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Arseniy Mukhin (#10)
Re: amcheck support for BRIN indexes

On 2025-Jul-06, Arseniy Mukhin wrote:

Sorry, forget to run a full test run with the new patch version. Some
tests were unhappy with the new unknown support function. Here the new
version with the fix.

Hello, I think this patch is probably a good idea. I don't think it
makes sense to introduce a bunch of code in 0003 only to rewrite it
completely in 0005. I would ask that you re-split your WITHIN_RANGE
(0004) to appear before the amcheck code, and then write the amcheck
code using that new functionality.

/*
* Return a tuple descriptor used for on-disk storage of BRIN tuples.
*/
-static TupleDesc
+TupleDesc
brtuple_disk_tupdesc(BrinDesc *brdesc)

I think we should give this function a better name if it's going to be
exported. How about brin_tuple_tupdesc? (in brin_tuple.h we
seem to distinguish "brin tuples" which are the stored ones, from "brin
mem tuples" which are the ones to be used in memory.)

I didn't read the other patches.

Thanks

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"I'm impressed how quickly you are fixing this obscure issue. I came from
MS SQL and it would be hard for me to put into words how much of a better job
you all are doing on [PostgreSQL]."
Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php

#12Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Alvaro Herrera (#11)
Re: amcheck support for BRIN indexes

On Sun, Jul 6, 2025 at 10:49 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Jul-06, Arseniy Mukhin wrote:

Sorry, forget to run a full test run with the new patch version. Some
tests were unhappy with the new unknown support function. Here the new
version with the fix.

Hello, I think this patch is probably a good idea. I don't think it
makes sense to introduce a bunch of code in 0003 only to rewrite it
completely in 0005. I would ask that you re-split your WITHIN_RANGE
(0004) to appear before the amcheck code, and then write the amcheck
code using that new functionality.

Hi, Álvaro!

Thank you for looking into this.

OK, we can easily revert to the version with consistent function if
needed, so let's get rid of it.

/*
* Return a tuple descriptor used for on-disk storage of BRIN tuples.
*/
-static TupleDesc
+TupleDesc
brtuple_disk_tupdesc(BrinDesc *brdesc)

I think we should give this function a better name if it's going to be
exported. How about brin_tuple_tupdesc? (in brin_tuple.h we
seem to distinguish "brin tuples" which are the stored ones, from "brin
mem tuples" which are the ones to be used in memory.)

'brin_tuple_tupdesc' sounds good to me. Done.

So here is a new version. 0001, 0002 - index structure check. 0003,
0004 - all heap indexed using WITHIN_RANGE approach.

Thank you!

Best regards,
Arseniy Mukhin

Attachments:

v7-0001-brin-refactoring.patchtext/x-patch; charset=US-ASCII; name=v7-0001-brin-refactoring.patchDownload+10-7
v7-0002-amcheck-brin_index_check-index-structure-check.patchtext/x-patch; charset=US-ASCII; name=v7-0002-amcheck-brin_index_check-index-structure-check.patchDownload+1407-9
v7-0003-Adds-new-BRIN-support-function-withinRange.patchtext/x-patch; charset=US-ASCII; name=v7-0003-Adds-new-BRIN-support-function-withinRange.patchDownload+510-42
v7-0004-amcheck-brin_index_check-heap-all-indexed.patchtext/x-patch; charset=US-ASCII; name=v7-0004-amcheck-brin_index_check-heap-all-indexed.patchDownload+377-37
#13Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Arseniy Mukhin (#12)
Re: amcheck support for BRIN indexes

On 7/7/25 13:06, Arseniy Mukhin wrote:

On Sun, Jul 6, 2025 at 10:49 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Jul-06, Arseniy Mukhin wrote:

Sorry, forget to run a full test run with the new patch version. Some
tests were unhappy with the new unknown support function. Here the new
version with the fix.

Hello, I think this patch is probably a good idea. I don't think it
makes sense to introduce a bunch of code in 0003 only to rewrite it
completely in 0005. I would ask that you re-split your WITHIN_RANGE
(0004) to appear before the amcheck code, and then write the amcheck
code using that new functionality.

Hi, Álvaro!

Thank you for looking into this.

OK, we can easily revert to the version with consistent function if
needed, so let's get rid of it.

Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE?
I'd probably try to do this using the regular consistent function:

(a) we don't need to add stuff to all BRIN opclasses to support this

(b) it gives us additional testing of the consistent function

(c) building a scan key for equality seems pretty trivial

What do you think?

--
Tomas Vondra

#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#13)
Re: amcheck support for BRIN indexes

On 2025-Jul-07, Tomas Vondra wrote:

Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE?
I'd probably try to do this using the regular consistent function:

(a) we don't need to add stuff to all BRIN opclasses to support this

(b) it gives us additional testing of the consistent function

(c) building a scan key for equality seems pretty trivial

What do you think?

Oh yeah, if we can build this on top of the existing primitives, then
I'm all for that.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/

#15Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Alvaro Herrera (#14)
Re: amcheck support for BRIN indexes

On Mon, Jul 7, 2025 at 3:21 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Jul-07, Tomas Vondra wrote:

Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE?
I'd probably try to do this using the regular consistent function:

(a) we don't need to add stuff to all BRIN opclasses to support this

(b) it gives us additional testing of the consistent function

(c) building a scan key for equality seems pretty trivial

What do you think?

Oh yeah, if we can build this on top of the existing primitives, then
I'm all for that.

Thank you for the feedback! I agree with the benefits. Speaking of
(с), it seems most of the time to be really trivial to build such a
ScanKey, but not every opclass supports '=' operator. amcheck should
handle these cases somehow then. I see two options here. The first is
to not provide 'heap all indexed' check for such opclasses, which is
sad because even one core opclass (box_inclusion_ops) doesn't support
'=' operator, postgis brin opclasses don't support it too AFAICS. The
second option is to let the user define which operator to use during
the check, which, I think, makes user experience much worse in this
case. So both options look not good from the user POV as for me, so I
don't know. What do you think about it?

And should I revert the patchset to the consistent function version then?

Best regards,
Arseniy Mukhin

#16Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Arseniy Mukhin (#15)
Re: amcheck support for BRIN indexes

On 7/8/25 14:40, Arseniy Mukhin wrote:

On Mon, Jul 7, 2025 at 3:21 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Jul-07, Tomas Vondra wrote:

Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE?
I'd probably try to do this using the regular consistent function:

(a) we don't need to add stuff to all BRIN opclasses to support this

(b) it gives us additional testing of the consistent function

(c) building a scan key for equality seems pretty trivial

What do you think?

Oh yeah, if we can build this on top of the existing primitives, then
I'm all for that.

Thank you for the feedback! I agree with the benefits. Speaking of
(с), it seems most of the time to be really trivial to build such a
ScanKey, but not every opclass supports '=' operator. amcheck should
handle these cases somehow then. I see two options here. The first is
to not provide 'heap all indexed' check for such opclasses, which is
sad because even one core opclass (box_inclusion_ops) doesn't support
'=' operator, postgis brin opclasses don't support it too AFAICS. The
second option is to let the user define which operator to use during
the check, which, I think, makes user experience much worse in this
case. So both options look not good from the user POV as for me, so I
don't know. What do you think about it?

And should I revert the patchset to the consistent function version then?

Yeah, that's a good point. The various opclasses may support different
operators, and we don't know which "strategy" to fill into the scan key.
Minmax needs BTEqualStrategyNumber, inclusion RTContainsStrategyNumber,
and so on.

I wonder if there's a way to figure this out from the catalogs, but I
can't think of anything. Maybe requiring the user to supply the operator
(and not checking heap if it's not provided)

SELECT brin_index_check(..., '@>');

would not be too bad ...

Alvaro, any ideas?

--
Tomas Vondra

#17Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Tomas Vondra (#16)
Re: amcheck support for BRIN indexes

On Tue, Jul 8, 2025 at 4:21 PM Tomas Vondra <tomas@vondra.me> wrote:

On 7/8/25 14:40, Arseniy Mukhin wrote:

On Mon, Jul 7, 2025 at 3:21 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:

On 2025-Jul-07, Tomas Vondra wrote:

Alvaro, what's your opinion on the introduction of the new WITHIN_RANGE?
I'd probably try to do this using the regular consistent function:

(a) we don't need to add stuff to all BRIN opclasses to support this

(b) it gives us additional testing of the consistent function

(c) building a scan key for equality seems pretty trivial

What do you think?

Oh yeah, if we can build this on top of the existing primitives, then
I'm all for that.

Thank you for the feedback! I agree with the benefits. Speaking of
(с), it seems most of the time to be really trivial to build such a
ScanKey, but not every opclass supports '=' operator. amcheck should
handle these cases somehow then. I see two options here. The first is
to not provide 'heap all indexed' check for such opclasses, which is
sad because even one core opclass (box_inclusion_ops) doesn't support
'=' operator, postgis brin opclasses don't support it too AFAICS. The
second option is to let the user define which operator to use during
the check, which, I think, makes user experience much worse in this
case. So both options look not good from the user POV as for me, so I
don't know. What do you think about it?

And should I revert the patchset to the consistent function version then?

Yeah, that's a good point. The various opclasses may support different
operators, and we don't know which "strategy" to fill into the scan key.
Minmax needs BTEqualStrategyNumber, inclusion RTContainsStrategyNumber,
and so on.

I wonder if there's a way to figure this out from the catalogs, but I
can't think of anything. Maybe requiring the user to supply the operator
(and not checking heap if it's not provided)

SELECT brin_index_check(..., '@>');

would not be too bad ...

This doesn't change much, but it seems we need an array of operators
in the case of a multi-column index. And one more thing, it's
theoretically possible that opclass doesn't have the operator we need
at all. I'm not aware of such cases, but perhaps in the future
something like GIN tsvector_ops could be created for BRIN.
tsvector_ops doesn't have an operator that could be used here.

Best regards,
Arseniy Mukhin

#18Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Arseniy Mukhin (#17)
Re: amcheck support for BRIN indexes

Hi,

While reviewing gist amcheck patch [1]/messages/by-id/41F2A10C-4577-413B-9140-BE81CCE04A60@yandex-team.ru I realized that brin amcheck
also must check if current snapshot is OK with index indcheckxmin (as
btree, gist do it). Currently this check is contained in btree amcheck
code, but other AMs need it for heapallindexed as well, so I moved it
from btree to verify_common (0003 patch).

Also I returned a consistentFn approach in heapallindexed as it seems
more preferable. But it's not a big deal to return to the within_range
approach if needed.

[1]: /messages/by-id/41F2A10C-4577-413B-9140-BE81CCE04A60@yandex-team.ru

Best regards,
Arseniy Mukhin

Attachments:

v8-0001-brin-refactoring.patchtext/x-patch; charset=US-ASCII; name=v8-0001-brin-refactoring.patchDownload+10-7
v8-0004-amcheck-brin_index_check-heap-all-indexed.patchtext/x-patch; charset=US-ASCII; name=v8-0004-amcheck-brin_index_check-heap-all-indexed.patchDownload+565-27
v8-0002-amcheck-brin_index_check-index-structure-check.patchtext/x-patch; charset=US-ASCII; name=v8-0002-amcheck-brin_index_check-index-structure-check.patchDownload+1407-9
v8-0003-amcheck-common_verify-register-snapshot-with-indc.patchtext/x-patch; charset=US-ASCII; name=v8-0003-amcheck-common_verify-register-snapshot-with-indc.patchDownload+32-23
#19Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Arseniy Mukhin (#18)
Re: amcheck support for BRIN indexes

Hi,

On Tue, Jul 22, 2025 at 6:43 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi,

While reviewing gist amcheck patch [1] I realized that brin amcheck
also must check if current snapshot is OK with index indcheckxmin (as
btree, gist do it). Currently this check is contained in btree amcheck
code, but other AMs need it for heapallindexed as well, so I moved it
from btree to verify_common (0003 patch).

There was a compiler warning on CI, so there is a new version with the
fix (adds #include to verify_common).
I also moved it to PG19-2.

Best regards,
Arseniy Mukhin

#20Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Arseniy Mukhin (#19)
Re: amcheck support for BRIN indexes

On Fri, Aug 1, 2025 at 11:22 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi,

On Tue, Jul 22, 2025 at 6:43 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:

Hi,

While reviewing gist amcheck patch [1] I realized that brin amcheck
also must check if current snapshot is OK with index indcheckxmin (as
btree, gist do it). Currently this check is contained in btree amcheck
code, but other AMs need it for heapallindexed as well, so I moved it
from btree to verify_common (0003 patch).

There was a compiler warning on CI, so there is a new version with the
fix (adds #include to verify_common).
I also moved it to PG19-2.

Sorry for the noise, forgot to attach the files.

Best regards,
Arseniy Mukhin

Attachments:

v9-0001-brin-refactoring.patchtext/x-patch; charset=US-ASCII; name=v9-0001-brin-refactoring.patchDownload+10-7
v9-0004-amcheck-brin_index_check-heap-all-indexed.patchtext/x-patch; charset=US-ASCII; name=v9-0004-amcheck-brin_index_check-heap-all-indexed.patchDownload+565-27
v9-0003-amcheck-common_verify-register-snapshot-with-indc.patchtext/x-patch; charset=US-ASCII; name=v9-0003-amcheck-common_verify-register-snapshot-with-indc.patchDownload+33-23
v9-0002-amcheck-brin_index_check-index-structure-check.patchtext/x-patch; charset=US-ASCII; name=v9-0002-amcheck-brin_index_check-index-structure-check.patchDownload+1407-9
#21Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#16)
#22Arseniy Mukhin
arseniy.mukhin.dev@gmail.com
In reply to: Alvaro Herrera (#21)