Fix gistkillitems & add regression test to microvacuum

Started by Kirill Reshke3 months ago11 messageshackers
Jump to latest
#1Kirill Reshke
reshkekirill@gmail.com

Hi hackers.

While looking at [0]coverage.postgresql.org/src/backend/access/gist/gistxlog.c.gcov.html I noticed that XLOG_GIST_DELETE & XLOG_GIST_PAGE_DELETE
records are not covered.

This thread addresses XLOG_GIST_DELETE, which is also known as a
microvacuum feature.

test.sql contains regression test that trigger this code to be
exercised in stream_regress.pl TAP test.

Test is as follows: we create a gist index on the table, then we
insert exactly 407 records, making the root page full (next insert
will trigger page split). Then I delete all tuples from relation and
trigger Index Only scan to do kill-on-select (killtuples). It marks
gist 0 page (which is root and is leaf) as has_garbage. Then, the next
insertion triggers xlog_gist_delete record.

To verify this I use pageinspect and pg_waldimp (locally). Also this
test is dependent on block size being 8192 which is not good.

And all of this does not work actually without v1-0001, because there
is a bug in GiST which does not call gistkillitmes for the very first
(root) page.

There is also test2.sql which inserts a single tuple, not 407. It can
be used to verify v1-0001.

[0]: coverage.postgresql.org/src/backend/access/gist/gistxlog.c.gcov.html

--
Best regards,
Kirill Reshke

Attachments:

test2.sqlapplication/octet-stream; name=test2.sqlDownload
test.sqlapplication/octet-stream; name=test.sqlDownload
v1-0001-Fix-gistkillitems-for-GiST-ROOT-page.patchapplication/octet-stream; name=v1-0001-Fix-gistkillitems-for-GiST-ROOT-page.patchDownload+4-4
#2Kirill Reshke
reshkekirill@gmail.com
In reply to: Kirill Reshke (#1)
Re: Fix gistkillitems & add regression test to microvacuum

On Thu, 15 Jan 2026 at 12:00, Kirill Reshke <reshkekirill@gmail.com> wrote:

Hi hackers.

While looking at [0] I noticed that XLOG_GIST_DELETE & XLOG_GIST_PAGE_DELETE
records are not covered.

This thread addresses XLOG_GIST_DELETE, which is also known as a
microvacuum feature.

test.sql contains regression test that trigger this code to be
exercised in stream_regress.pl TAP test.

Test is as follows: we create a gist index on the table, then we
insert exactly 407 records, making the root page full (next insert
will trigger page split). Then I delete all tuples from relation and
trigger Index Only scan to do kill-on-select (killtuples). It marks
gist 0 page (which is root and is leaf) as has_garbage. Then, the next
insertion triggers xlog_gist_delete record.

To verify this I use pageinspect and pg_waldimp (locally). Also this
test is dependent on block size being 8192 which is not good.

And all of this does not work actually without v1-0001, because there
is a bug in GiST which does not call gistkillitmes for the very first
(root) page.

There is also test2.sql which inserts a single tuple, not 407. It can
be used to verify v1-0001.

[0] coverage.postgresql.org/src/backend/access/gist/gistxlog.c.gcov.html

--
Best regards,
Kirill Reshke

From cf feedback it turns out we already have an isolation test for
this, and it does almost exactly the same.
And more, it fails.
Will try to fix

--
Best regards,
Kirill Reshke

#3Kirill Reshke
reshkekirill@gmail.com
In reply to: Kirill Reshke (#2)
Re: Fix gistkillitems & add regression test to microvacuum

On Thu, 15 Jan 2026 at 12:46, Kirill Reshke <reshkekirill@gmail.com> wrote:

On Thu, 15 Jan 2026 at 12:00, Kirill Reshke <reshkekirill@gmail.com> wrote:

Hi hackers.

While looking at [0] I noticed that XLOG_GIST_DELETE & XLOG_GIST_PAGE_DELETE
records are not covered.

This thread addresses XLOG_GIST_DELETE, which is also known as a
microvacuum feature.

test.sql contains regression test that trigger this code to be
exercised in stream_regress.pl TAP test.

Test is as follows: we create a gist index on the table, then we
insert exactly 407 records, making the root page full (next insert
will trigger page split). Then I delete all tuples from relation and
trigger Index Only scan to do kill-on-select (killtuples). It marks
gist 0 page (which is root and is leaf) as has_garbage. Then, the next
insertion triggers xlog_gist_delete record.

To verify this I use pageinspect and pg_waldimp (locally). Also this
test is dependent on block size being 8192 which is not good.

And all of this does not work actually without v1-0001, because there
is a bug in GiST which does not call gistkillitmes for the very first
(root) page.

There is also test2.sql which inserts a single tuple, not 407. It can
be used to verify v1-0001.

[0] coverage.postgresql.org/src/backend/access/gist/gistxlog.c.gcov.html

--
Best regards,
Kirill Reshke

From cf feedback it turns out we already have an isolation test for
this, and it does almost exactly the same.
And more, it fails.
Will try to fix

--
Best regards,
Kirill Reshke

This looks like gist does not work for small indexes and this is
explicitly tested after [0]/messages/by-id/lxzj26ga6ippdeunz6kuncectr5gfuugmm2ry22qu6hcx6oid6@lzx3sjsqhmt6
[0]: /messages/by-id/lxzj26ga6ippdeunz6kuncectr5gfuugmm2ry22qu6hcx6oid6@lzx3sjsqhmt6

--
Best regards,
Kirill Reshke

#4Kirill Reshke
reshkekirill@gmail.com
In reply to: Kirill Reshke (#3)
Re: Fix gistkillitems & add regression test to microvacuum

On Thu, 15 Jan 2026 at 13:21, Kirill Reshke <reshkekirill@gmail.com> wrote:

On Thu, 15 Jan 2026 at 12:46, Kirill Reshke <reshkekirill@gmail.com> wrote:

On Thu, 15 Jan 2026 at 12:00, Kirill Reshke <reshkekirill@gmail.com> wrote:

Hi hackers.

While looking at [0] I noticed that XLOG_GIST_DELETE & XLOG_GIST_PAGE_DELETE
records are not covered.

This thread addresses XLOG_GIST_DELETE, which is also known as a
microvacuum feature.

test.sql contains regression test that trigger this code to be
exercised in stream_regress.pl TAP test.

Test is as follows: we create a gist index on the table, then we
insert exactly 407 records, making the root page full (next insert
will trigger page split). Then I delete all tuples from relation and
trigger Index Only scan to do kill-on-select (killtuples). It marks
gist 0 page (which is root and is leaf) as has_garbage. Then, the next
insertion triggers xlog_gist_delete record.

To verify this I use pageinspect and pg_waldimp (locally). Also this
test is dependent on block size being 8192 which is not good.

And all of this does not work actually without v1-0001, because there
is a bug in GiST which does not call gistkillitmes for the very first
(root) page.

There is also test2.sql which inserts a single tuple, not 407. It can
be used to verify v1-0001.

[0] coverage.postgresql.org/src/backend/access/gist/gistxlog.c.gcov.html

--
Best regards,
Kirill Reshke

From cf feedback it turns out we already have an isolation test for
this, and it does almost exactly the same.
And more, it fails.
Will try to fix

--
Best regards,
Kirill Reshke

This looks like gist does not work for small indexes and this is
explicitly tested after [0]
[0] /messages/by-id/lxzj26ga6ippdeunz6kuncectr5gfuugmm2ry22qu6hcx6oid6@lzx3sjsqhmt6

--
Best regards,
Kirill Reshke

I was right on commit message of 377b7ab

"""
For gist some related paths were reached, but gist's implementation
seems to not work if all the dead tuples are on one page (or something
like that). The coverage for other index types was rather incidental.
"""

It does not work if all the dead tuples are on one page and this page is ROOT.

So, should we delete this

...
# Test gist, but with fewer rows - shows that killitems doesn't work anymore!
permutation
create_table fill_10 create_ext_btree_gist create_gist flush
disable_seq disable_bitmap

...

from isolation spec?

--
Best regards,
Kirill Reshke

#5Kirill Reshke
reshkekirill@gmail.com
In reply to: Kirill Reshke (#4)
Re: Fix gistkillitems & add regression test to microvacuum

On Thu, 15 Jan 2026 at 13:35, Kirill Reshke <reshkekirill@gmail.com> wrote:

On Thu, 15 Jan 2026 at 13:21, Kirill Reshke <reshkekirill@gmail.com> wrote:

On Thu, 15 Jan 2026 at 12:46, Kirill Reshke <reshkekirill@gmail.com> wrote:

On Thu, 15 Jan 2026 at 12:00, Kirill Reshke <reshkekirill@gmail.com> wrote:

Hi hackers.

While looking at [0] I noticed that XLOG_GIST_DELETE & XLOG_GIST_PAGE_DELETE
records are not covered.

This thread addresses XLOG_GIST_DELETE, which is also known as a
microvacuum feature.

test.sql contains regression test that trigger this code to be
exercised in stream_regress.pl TAP test.

Test is as follows: we create a gist index on the table, then we
insert exactly 407 records, making the root page full (next insert
will trigger page split). Then I delete all tuples from relation and
trigger Index Only scan to do kill-on-select (killtuples). It marks
gist 0 page (which is root and is leaf) as has_garbage. Then, the next
insertion triggers xlog_gist_delete record.

To verify this I use pageinspect and pg_waldimp (locally). Also this
test is dependent on block size being 8192 which is not good.

And all of this does not work actually without v1-0001, because there
is a bug in GiST which does not call gistkillitmes for the very first
(root) page.

There is also test2.sql which inserts a single tuple, not 407. It can
be used to verify v1-0001.

[0] coverage.postgresql.org/src/backend/access/gist/gistxlog.c.gcov.html

--
Best regards,
Kirill Reshke

From cf feedback it turns out we already have an isolation test for
this, and it does almost exactly the same.
And more, it fails.
Will try to fix

--
Best regards,
Kirill Reshke

This looks like gist does not work for small indexes and this is
explicitly tested after [0]
[0] /messages/by-id/lxzj26ga6ippdeunz6kuncectr5gfuugmm2ry22qu6hcx6oid6@lzx3sjsqhmt6

--
Best regards,
Kirill Reshke

I was right on commit message of 377b7ab

"""
For gist some related paths were reached, but gist's implementation
seems to not work if all the dead tuples are on one page (or something
like that). The coverage for other index types was rather incidental.
"""

It does not work if all the dead tuples are on one page and this page is ROOT.

So, should we delete this

...
# Test gist, but with fewer rows - shows that killitems doesn't work anymore!
permutation
create_table fill_10 create_ext_btree_gist create_gist flush
disable_seq disable_bitmap

...

from isolation spec?

--
Best regards,
Kirill Reshke

PFA v2 which leaves the test in-place.

Also commit message improvements.

--
Best regards,
Kirill Reshke

Attachments:

v2-0001-Fix-gistkillitems-for-GiST-ROOT-page.patchapplication/octet-stream; name=v2-0001-Fix-gistkillitems-for-GiST-ROOT-page.patchDownload+5-6
#6Andrey Borodin
amborodin@acm.org
In reply to: Kirill Reshke (#5)
Re: Fix gistkillitems & add regression test to microvacuum

On 15 Jan 2026, at 22:59, Kirill Reshke <reshkekirill@gmail.com> wrote:

PFA v2 which leaves the test in-place.

Also commit message improvements.

Yeah, killtuples for GiST root page is broken. Your patch is fixing it.
I don't think we should backpatch this, the bug is harmless, but for master the patch LGTM.
It would be good to assign so->curBlkno and so->curBlkno together. But gistScanPage() is the only place with access to the block number.

+# Test gist, but with fewer rows - that killitems used to be buggy.

Probably, in this comment we can explicitly say that killitems was buggy, but now is fixed.

Thanks!

Best regards, Andrey Borodin.

#7Kirill Reshke
reshkekirill@gmail.com
In reply to: Andrey Borodin (#6)
Re: Fix gistkillitems & add regression test to microvacuum

On Tue, 20 Jan 2026 at 15:30, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 15 Jan 2026, at 22:59, Kirill Reshke <reshkekirill@gmail.com> wrote:

PFA v2 which leaves the test in-place.

Also commit message improvements.

Yeah, killtuples for GiST root page is broken. Your patch is fixing it.
I don't think we should backpatch this, the bug is harmless, but for master the patch LGTM.

Thank you

It would be good to assign so->curBlkno and so->curBlkno together. But gistScanPage() is the only place with access to the block number.

Sorry, didnt get this take.

+# Test gist, but with fewer rows - that killitems used to be buggy.

Probably, in this comment we can explicitly say that killitems was buggy, but now is fixed.

Hmm, what would be a good wording here?

--
Best regards,
Kirill Reshke

#8Andrey Borodin
amborodin@acm.org
In reply to: Kirill Reshke (#7)
Re: Fix gistkillitems & add regression test to microvacuum

On 23 Jan 2026, at 16:03, Kirill Reshke <reshkekirill@gmail.com> wrote:

On Tue, 20 Jan 2026 at 15:30, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 15 Jan 2026, at 22:59, Kirill Reshke <reshkekirill@gmail.com> wrote:

PFA v2 which leaves the test in-place.

Also commit message improvements.

Yeah, killtuples for GiST root page is broken. Your patch is fixing it.
I don't think we should backpatch this, the bug is harmless, but for master the patch LGTM.

Thank you

It would be good to assign so->curBlkno and so->curBlkno together. But gistScanPage() is the only place with access to the block number.

Sorry, didnt get this take.

Sorry, I meant so->curBlkno and so->numKilled are semantically correlated. But it's difficult to assign them together and this does not worth refactoring.

+# Test gist, but with fewer rows - that killitems used to be buggy.

Probably, in this comment we can explicitly say that killitems was buggy, but now is fixed.

Hmm, what would be a good wording here?

I've opened an English dictionary and it says "used to" can be used with a meaning "bug was there but it's not anymore".

So the patch is RfC IMO.

Best regards, Andrey Borodin.

#9Soumya S Murali
soumyamurali.work@gmail.com
In reply to: Andrey Borodin (#8)
Re: Fix gistkillitems & add regression test to microvacuum

Hi all,

Thank you for the updated patch.

On Wed, Feb 11, 2026 at 2:52 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 23 Jan 2026, at 16:03, Kirill Reshke <reshkekirill@gmail.com> wrote:

On Tue, 20 Jan 2026 at 15:30, Andrey Borodin <x4mmm@yandex-team.ru> wrote:

On 15 Jan 2026, at 22:59, Kirill Reshke <reshkekirill@gmail.com> wrote:

PFA v2 which leaves the test in-place.

Also commit message improvements.

Yeah, killtuples for GiST root page is broken. Your patch is fixing it.
I don't think we should backpatch this, the bug is harmless, but for master the patch LGTM.

Thank you

It would be good to assign so->curBlkno and so->curBlkno together. But gistScanPage() is the only place with access to the block number.

Sorry, didnt get this take.

Sorry, I meant so->curBlkno and so->numKilled are semantically correlated. But it's difficult to assign them together and this does not worth refactoring.

+# Test gist, but with fewer rows - that killitems used to be buggy.

Probably, in this comment we can explicitly say that killitems was buggy, but now is fixed.

Hmm, what would be a good wording here?

I've opened an English dictionary and it says "used to" can be used with a meaning "bug was there but it's not anymore".

So the patch is RfC IMO.

I reproduced the issue locally by filling a GiST root page, deleting
all tuples, and triggering a microvacuum by an index-only scan.
With the patch, The root page is now handled consistently with other
leaf pages. so->curBlkno and so->curPageLSN are properly set during
scan, and gistkillitems() operates safely and correctly on the root
page.
Based on runtime validation and code inspection, the fix LGTM.

Thanks,
Soumya

#10Kirill Reshke
reshkekirill@gmail.com
In reply to: Soumya S Murali (#9)
Re: Fix gistkillitems & add regression test to microvacuum

On Wed, 11 Feb 2026 at 17:18, Soumya S Murali
<soumyamurali.work@gmail.com> wrote:

Hi all,

Thank you for the updated patch.
I reproduced the issue locally by filling a GiST root page, deleting
all tuples, and triggering a microvacuum by an index-only scan.
With the patch, The root page is now handled consistently with other
leaf pages. so->curBlkno and so->curPageLSN are properly set during
scan, and gistkillitems() operates safely and correctly on the root
page.
Based on runtime validation and code inspection, the fix LGTM.

Rebased due to f5eb854ab6d[0]https://git.postgresql.org/cgit/postgresql.git/commit/?id=f5eb854ab6d6281ec2d3143657944bdda6676341

I have added a [0]https://git.postgresql.org/cgit/postgresql.git/commit/?id=f5eb854ab6d6281ec2d3143657944bdda6676341committer to CC.

Andres, can you please take a look at this thread, if by chance you
have spare time for it? You have changed kill tuples-related logic
frequently, so you might be interested...

[0]: https://git.postgresql.org/cgit/postgresql.git/commit/?id=f5eb854ab6d6281ec2d3143657944bdda6676341

--
Best regards,
Kirill Reshke

Attachments:

v3-0001-Fix-gistkillitems-for-GiST-ROOT-page.patchapplication/octet-stream; name=v3-0001-Fix-gistkillitems-for-GiST-ROOT-page.patchDownload+5-6
#11Soumya S Murali
soumyamurali.work@gmail.com
In reply to: Kirill Reshke (#10)
Re: Fix gistkillitems & add regression test to microvacuum

Hi all,

On Wed, Mar 18, 2026 at 11:12 AM Kirill Reshke <reshkekirill@gmail.com> wrote:

On Wed, 11 Feb 2026 at 17:18, Soumya S Murali
<soumyamurali.work@gmail.com> wrote:

Hi all,

Thank you for the updated patch.
I reproduced the issue locally by filling a GiST root page, deleting
all tuples, and triggering a microvacuum by an index-only scan.
With the patch, The root page is now handled consistently with other
leaf pages. so->curBlkno and so->curPageLSN are properly set during
scan, and gistkillitems() operates safely and correctly on the root
page.
Based on runtime validation and code inspection, the fix LGTM.

Rebased due to f5eb854ab6d[0]

I have added a [0]committer to CC.

Andres, can you please take a look at this thread, if by chance you
have spare time for it? You have changed kill tuples-related logic
frequently, so you might be interested...

[0] https://git.postgresql.org/cgit/postgresql.git/commit/?id=f5eb854ab6d6281ec2d3143657944bdda6676341

I performed a follow-up test on top of the v3 patch with a strictly
controlled setup to ensure a single-page GiST index (confirming that
block 1 is out of range). To force the cleanup path, I executed an
index-only scan after deleting all tuples and used a narrow WAL window
to isolate the relevant records. The WAL output shows GiST PAGE_UPDATE
on block 0, confirming that the root page is modified after the scan.
However even in this setup, I do not observe any XLOG_GIST_DELETE
records. This seems while the patch correctly initializes curBlkno and
enables proper handling of the root page, the DELETE WAL emission
depends on whether killitems actually marks tuples as DEAD under the
given visible conditions.
Overall, the fix looks correct from a code perspective, but in this
scenario the cleanup does not result in a DELETE WAL record.

Regards,
Soumya