GiST nitpicks I want to discuss (and maybe eventually fix)
Hi hackers!
There are several things in GiST that trigger me when I think about
them. They are minor by themselves and maybe are not worth the trouble
fixing alone... But I decided to create a thread to discuss if we
should do anything about them.
0001:
Remove GistTuplesDeleted support from GiST
This bit flag value is unused since
68446b2c87a2aee5d8c2eb2aade7bb6d4195b7e1.
F_TUPLES_DELETED flag is not removed, since it is still used for
pageinspect. This is because we will still have this flag in mask
after a major upgrade, so we should keep page inspect code that
displays it. Patch merely suggests not to set it anymore.
(there was 0002, but I self-rejected it, because it is too picky)
0003:
Remove block field from gistxlogPageReuse walrecord.
GIST_REUSE_PAGE is spectail WAL record. It does not reference any
page, does not change any page in redo, its sole purpose is to kill
standby-queries which might still observe GiST index page we want to
reuse. It has `block` field which is simply not used for any purpose.
I propose to remove this field, reducing WAL footprint by an enormous
4 byte-per-record.
0004:
Reduce GIST_MAX_SPLIT_PAGES to some sane limit.
Value of GIST_MAX_SPLIT_PAGES is too big. There is no chance that GiST
page split will successfully produce 75 new pages, because there is an
upper limit of how many backup blocks a single WAL record can
reference (XLR_MAX_BLOCK_ID = 32). Patch reduces to
GIST_MAX_SPLIT_PAGES and adds static assertion for value.
Some history:
commit introducing GIST_MAX_SPLIT_PAGES [0]https://github.com/postgres/postgres/commit/2c03216d8311
commit introducing XLR_MAX_BLOCK_ID [1]
[0]: https://github.com/postgres/postgres/commit/2c03216d8311
I was able to create a case where split produces 6 pages. I was not
able to do more. So, I dont know if there is any real-life case where
this problem reproduces.
[0]: https://github.com/postgres/postgres/commit/2c03216d8311
[0]: https://github.com/postgres/postgres/commit/2c03216d8311
--
Best regards,
Kirill Reshke
Attachments:
v1-0004-Reduce-GIST_MAX_SPLIT_PAGES-to-some-sane-limit.patchapplication/octet-stream; name=v1-0004-Reduce-GIST_MAX_SPLIT_PAGES-to-some-sane-limit.patchDownload
From e5d2f734b8cb7c5cf4d25d979c5e829f7da1a10c Mon Sep 17 00:00:00 2001
From: reshke <reshke@double.cloud>
Date: Mon, 6 Oct 2025 17:27:25 +0000
Subject: [PATCH v1 4/4] Reduce GIST_MAX_SPLIT_PAGES to some sane limit.
There is a limit for of backup blocks that single wal
record can reference (XLR_MAX_BLOCK_ID). Enforce and check
this limit for GiST page split.
---
src/include/access/gist_private.h | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index 0a38d1ea4dc..9963a4bfd18 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -22,6 +22,7 @@
#include "storage/buffile.h"
#include "utils/hsearch.h"
#include "access/genam.h"
+#include "access/xlogrecord.h"
/*
* Maximum number of "halves" a page can be split into in one operation.
@@ -36,7 +37,14 @@
* so if you raise this higher than that limit, you'll just get a different
* error.
*/
-#define GIST_MAX_SPLIT_PAGES 75
+#define GIST_MAX_SPLIT_PAGES 25
+
+/* Enforce the size rule given in the comment above. +1 because zeroth backup block
+* it used for clearing follow-right pointer, blocks 1...GIST_MAX_SPLIT_PAGES for
+* split pages */
+StaticAssertDecl(GIST_MAX_SPLIT_PAGES + 1 <= XLR_MAX_BLOCK_ID,
+ "GIST_MAX_SPLIT_PAGES exceeds maximum allowed number of backup blocks");
+
/* Buffer lock modes */
#define GIST_SHARE BUFFER_LOCK_SHARE
--
2.43.0
v1-0003-Remove-block-field-from-gistxlogPageReuse-walreco.patchapplication/octet-stream; name=v1-0003-Remove-block-field-from-gistxlogPageReuse-walreco.patchDownload
From e49ef681d5e67cb0aa2c3fb22fe94311c7684e08 Mon Sep 17 00:00:00 2001
From: reshke <reshke@double.cloud>
Date: Mon, 6 Oct 2025 17:07:13 +0000
Subject: [PATCH v1 3/4] Remove block field from gistxlogPageReuse walrecord.
remove unused field, reduce wal footprint
---
src/backend/access/gist/gistutil.c | 2 +-
src/backend/access/gist/gistxlog.c | 4 +---
src/backend/access/rmgrdesc/gistdesc.c | 4 ++--
src/include/access/gist_private.h | 2 +-
src/include/access/gistxlog.h | 1 -
5 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/src/backend/access/gist/gistutil.c b/src/backend/access/gist/gistutil.c
index cdc4ab3151b..3c057879519 100644
--- a/src/backend/access/gist/gistutil.c
+++ b/src/backend/access/gist/gistutil.c
@@ -864,7 +864,7 @@ gistNewBuffer(Relation r, Relation heaprel)
* page's deleteXid.
*/
if (XLogStandbyInfoActive() && RelationNeedsWAL(r))
- gistXLogPageReuse(r, heaprel, blkno, GistPageGetDeleteXid(page));
+ gistXLogPageReuse(r, heaprel, GistPageGetDeleteXid(page));
return buffer;
}
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index afbe4dbb274..51db4a0744e 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -588,8 +588,7 @@ gistXLogAssignLSN(void)
* Write XLOG record about reuse of a deleted page.
*/
void
-gistXLogPageReuse(Relation rel, Relation heaprel,
- BlockNumber blkno, FullTransactionId deleteXid)
+gistXLogPageReuse(Relation rel, Relation heaprel, FullTransactionId deleteXid)
{
gistxlogPageReuse xlrec_reuse;
@@ -602,7 +601,6 @@ gistXLogPageReuse(Relation rel, Relation heaprel,
/* XLOG stuff */
xlrec_reuse.isCatalogRel = RelationIsAccessibleInLogicalDecoding(heaprel);
xlrec_reuse.locator = rel->rd_locator;
- xlrec_reuse.block = blkno;
xlrec_reuse.snapshotConflictHorizon = deleteXid;
XLogBeginInsert();
diff --git a/src/backend/access/rmgrdesc/gistdesc.c b/src/backend/access/rmgrdesc/gistdesc.c
index 3e9eba01eb9..24332bb8b95 100644
--- a/src/backend/access/rmgrdesc/gistdesc.c
+++ b/src/backend/access/rmgrdesc/gistdesc.c
@@ -47,9 +47,9 @@ out_gistxlogPageUpdate(StringInfo buf, XLogReaderState *record, gistxlogPageUpda
static void
out_gistxlogPageReuse(StringInfo buf, gistxlogPageReuse *xlrec)
{
- appendStringInfo(buf, "rel %u/%u/%u; blk %u; snapshotConflictHorizon %u:%u, isCatalogRel %c",
+ appendStringInfo(buf, "rel %u/%u/%u; snapshotConflictHorizon %u:%u, isCatalogRel %c",
xlrec->locator.spcOid, xlrec->locator.dbOid,
- xlrec->locator.relNumber, xlrec->block,
+ xlrec->locator.relNumber,
EpochFromFullTransactionId(xlrec->snapshotConflictHorizon),
XidFromFullTransactionId(xlrec->snapshotConflictHorizon),
xlrec->isCatalogRel ? 'T' : 'F');
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index 39404ec7cdb..0a38d1ea4dc 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -440,7 +440,7 @@ extern XLogRecPtr gistXLogPageDelete(Buffer buffer,
FullTransactionId xid, Buffer parentBuffer,
OffsetNumber downlinkOffset);
-extern void gistXLogPageReuse(Relation rel, Relation heaprel, BlockNumber blkno,
+extern void gistXLogPageReuse(Relation rel, Relation heaprel,
FullTransactionId deleteXid);
extern XLogRecPtr gistXLogUpdate(Buffer buffer,
diff --git a/src/include/access/gistxlog.h b/src/include/access/gistxlog.h
index d551c0a19d4..717ac790b7c 100644
--- a/src/include/access/gistxlog.h
+++ b/src/include/access/gistxlog.h
@@ -99,7 +99,6 @@ typedef struct gistxlogPageDelete
typedef struct gistxlogPageReuse
{
RelFileLocator locator;
- BlockNumber block;
FullTransactionId snapshotConflictHorizon;
bool isCatalogRel; /* to handle recovery conflict during logical
* decoding on standby */
--
2.43.0
v1-0001-Remove-GistTuplesDeleted-support-from-GiST.patchapplication/octet-stream; name=v1-0001-Remove-GistTuplesDeleted-support-from-GiST.patchDownload
From 6239200bbe091fa3651bd7c2245f751563e83270 Mon Sep 17 00:00:00 2001
From: reshke <reshke@double.cloud>
Date: Mon, 6 Oct 2025 16:53:01 +0000
Subject: [PATCH v1 1/4] Remove GistTuplesDeleted support from GiST
This bit flag value is unused since
68446b2c87a2aee5d8c2eb2aade7bb6d4195b7e1.
F_TUPLES_DELETED flag is not removed, since it is still used for
pageinspect.
---
src/backend/access/gist/gistvacuum.c | 1 -
src/backend/access/gist/gistxlog.c | 3 ---
src/include/access/gist.h | 6 +-----
3 files changed, 1 insertion(+), 9 deletions(-)
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index b925eda2b9b..f48ac0bdf5a 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -401,7 +401,6 @@ restart:
MarkBufferDirty(buffer);
PageIndexMultiDelete(page, todelete, ntodelete);
- GistMarkTuplesDeleted(page);
if (RelationNeedsWAL(rel))
{
diff --git a/src/backend/access/gist/gistxlog.c b/src/backend/access/gist/gistxlog.c
index 42fee1f0764..afbe4dbb274 100644
--- a/src/backend/access/gist/gistxlog.c
+++ b/src/backend/access/gist/gistxlog.c
@@ -115,8 +115,6 @@ gistRedoPageUpdateRecord(XLogReaderState *record)
data += sizeof(OffsetNumber) * xldata->ntodelete;
PageIndexMultiDelete(page, todelete, xldata->ntodelete);
- if (GistPageIsLeaf(page))
- GistMarkTuplesDeleted(page);
}
/* Add new tuples if any */
@@ -206,7 +204,6 @@ gistRedoDeleteRecord(XLogReaderState *record)
PageIndexMultiDelete(page, toDelete, xldata->ntodelete);
GistClearPageHasGarbage(page);
- GistMarkTuplesDeleted(page);
PageSetLSN(page, lsn);
MarkBufferDirty(buffer);
diff --git a/src/include/access/gist.h b/src/include/access/gist.h
index b3f4e02cbfd..f60e0e58ec3 100644
--- a/src/include/access/gist.h
+++ b/src/include/access/gist.h
@@ -49,7 +49,7 @@
#define F_LEAF (1 << 0) /* leaf page */
#define F_DELETED (1 << 1) /* the page has been deleted */
#define F_TUPLES_DELETED (1 << 2) /* some tuples on the page were
- * deleted */
+ * deleted, currently unused */
#define F_FOLLOW_RIGHT (1 << 3) /* page to the right has no downlink */
#define F_HAS_GARBAGE (1 << 4) /* some tuples on the page are dead,
* but not deleted yet */
@@ -174,10 +174,6 @@ typedef struct GISTENTRY
#define GistPageIsDeleted(page) ( GistPageGetOpaque(page)->flags & F_DELETED)
-#define GistTuplesDeleted(page) ( GistPageGetOpaque(page)->flags & F_TUPLES_DELETED)
-#define GistMarkTuplesDeleted(page) ( GistPageGetOpaque(page)->flags |= F_TUPLES_DELETED)
-#define GistClearTuplesDeleted(page) ( GistPageGetOpaque(page)->flags &= ~F_TUPLES_DELETED)
-
#define GistPageHasGarbage(page) ( GistPageGetOpaque(page)->flags & F_HAS_GARBAGE)
#define GistMarkPageHasGarbage(page) ( GistPageGetOpaque(page)->flags |= F_HAS_GARBAGE)
#define GistClearPageHasGarbage(page) ( GistPageGetOpaque(page)->flags &= ~F_HAS_GARBAGE)
--
2.43.0
Hi Kirill!
On 6 Oct 2025, at 22:57, Kirill Reshke <reshkekirill@gmail.com> wrote:
0001:
Remove GistTuplesDeleted support from GiSTThis bit flag value is unused since
68446b2c87a2aee5d8c2eb2aade7bb6d4195b7e1.
F_TUPLES_DELETED flag is not removed, since it is still used for
pageinspect. This is because we will still have this flag in mask
after a major upgrade, so we should keep page inspect code that
displays it. Patch merely suggests not to set it anymore.
Makes sense. I hope one day we will add a catalog field to track index creation version. This would pave the way to get rid of invalid GiST tuples and return this flag too. We can use it for something better.
FTR, in commit message you could mention shorter version of hash 68446b2.
(there was 0002, but I self-rejected it, because it is too picky)
Out of curiosity, what was that?
0003:
Remove block field from gistxlogPageReuse walrecord.
GIST_REUSE_PAGE is spectail WAL record. It does not reference any
page, does not change any page in redo, its sole purpose is to kill
standby-queries which might still observe GiST index page we want to
reuse. It has `block` field which is simply not used for any purpose.
I propose to remove this field, reducing WAL footprint by an enormous
4 byte-per-record.
Yes, this is Hot-standby-only record. In offline conversation you mentioned that "RelFileLocator locator;" is not needed, only "Oid dbOid;". Maybe let's save a little more bytes?
0004:
Reduce GIST_MAX_SPLIT_PAGES to some sane limit.Value of GIST_MAX_SPLIT_PAGES is too big. There is no chance that GiST
page split will successfully produce 75 new pages, because there is an
upper limit of how many backup blocks a single WAL record can
reference (XLR_MAX_BLOCK_ID = 32). Patch reduces to
GIST_MAX_SPLIT_PAGES and adds static assertion for value.Some history:
commit introducing GIST_MAX_SPLIT_PAGES [0]
commit introducing XLR_MAX_BLOCK_ID [1][0] was committed before [1]
I was able to create a case where split produces 6 pages. I was not
able to do more. So, I dont know if there is any real-life case where
this problem reproduces.
IMHO, static assert is too fancy. Why not just #define GIST_MAX_SPLIT_PAGES (XLR_MAX_BLOCK_ID - 1) ?
Overall changes seem valuable, reducing GiST code complexity. Thank you!
Best regards, Andrey Borodin.
On Mon, 6 Oct 2025 at 23:53, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
Thank you for review
Makes sense. I hope one day we will add a catalog field to track index creation version. This would pave the way to get rid of invalid GiST tuples and return this flag too. We can use it for something better.
In the GIN index we have an index creation version, and it is placed
on the metapage. So does btree, if i'm not mistaken . So, the index
version is stored in data, not in catalog. I doubt we will support two
technologies for one purpose. Since GiST index has no metapage, I
doubt we will be successful here.
Yes, this is Hot-standby-only record. In offline conversation you mentioned that "RelFileLocator locator;" is not needed, only "Oid dbOid;". Maybe let's save a little more bytes?
It is about ResolveRecoveryConflictWithSnapshot. This function
RelFileLocator arg and only uses database oid. Changing this is too
out-of-scope.
--
Best regards,
Kirill Reshke
Hi,
On Tue, Oct 07, 2025 at 12:02:21AM +0500, Kirill Reshke wrote:
On Mon, 6 Oct 2025 at 23:53, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
Makes sense. I hope one day we will add a catalog field to track
index creation version. This would pave the way to get rid of
invalid GiST tuples and return this flag too. We can use it for
something better.In the GIN index we have an index creation version, and it is placed
on the metapage. So does btree, if i'm not mistaken . So, the index
version is stored in data, not in catalog. I doubt we will support two
technologies for one purpose. Since GiST index has no metapage, I
doubt we will be successful here.
I always found it klunky that you need (AFAIK) an extension
(pageinspect) to figure out the btree/gin index version via its
meta-page and ISTM that this would be useful information to be stored in
the catalog.
Michael