[SP-]GiST IOS visibility bug (was: Why doens't GiST require super-exclusive lock)
Hi,
In the original thread [0]/messages/by-id/flat/CAH2-Wz=PqOziyRSrnN5jAtfXWXY7-BJcHz9S355LH8Dt=5qxWQ@mail.gmail.com we established that GIST and SP-GiST
support index-only scans, but don't have sufficient interlocking to
prevent dead tuples from being revived for their scan by vacuum. As
that thread already contains a lot of detail, I won't go into the
specifics here, but the gist (heh) of it is as follows:
(SP-)GiST index scans don't interlock with their index vacuum in a way
that guarantees that their index-only scans only return TIDs that are
known to be valid TIDs when they're returned by amgettuple, thus
allowing index vacuum to clean TIDs up and let VACUUM mark their pages
ALL_VISIBLE before the tids were returned by the index-only scan.
The keen-eyed reader may have noticed that this issue is very similar
to the issue fixed with 459e7bf8, and that's correct. Both are cases
where the visibility map was accessed to determine the visibility of a
TID of which we didn't know for sure it was still a valid TID on that
heap page.
While the thread in [0]/messages/by-id/flat/CAH2-Wz=PqOziyRSrnN5jAtfXWXY7-BJcHz9S355LH8Dt=5qxWQ@mail.gmail.com contains a very thorough fix, and (IMO) some
good additions to how index-only scans work with table AMs, the patch
currently proposed there changes things in non-backpatchable ways. So,
attached is a set of 4 patches designed for backpatching a fix.
The patches work similar to [0]/messages/by-id/flat/CAH2-Wz=PqOziyRSrnN5jAtfXWXY7-BJcHz9S355LH8Dt=5qxWQ@mail.gmail.com's, but with less invasive code changes
and no meaningful ABI breaks:
index vacuum's lock requirements are increased to super-exclusive
(cleanup) locks. Additionally, index-only scans will now do visibility
checks of each tuple they want to return, before the pin on the index
page that contains those tuples is released.
The visibility check is done on a tuple-by-tuple basis, through a new
function table_index_vischeck_tuple (so as to not totally break the
semantic barrier between indexes and tables regarding visibility
checks). In the patchset of [0]/messages/by-id/flat/CAH2-Wz=PqOziyRSrnN5jAtfXWXY7-BJcHz9S355LH8Dt=5qxWQ@mail.gmail.com I added a table AM routine to allow
tableAMs to extend this, but that's not backportable, hence this
separate thread for discussing a backpatchable bugfix.
The one exposed ABI change is done by adding a single-byte
"visrecheck" field in an alignment area in IndexScanDescData - which
historically has been the place to put new fields required for
backported bugfixes. The size of the struct doesn't change, and there
should be no backward or forward compatibility issues for extensions
that might make use of this new field as long as they don't assume the
value of the field when they haven't set it themselves.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
[0]: /messages/by-id/flat/CAH2-Wz=PqOziyRSrnN5jAtfXWXY7-BJcHz9S355LH8Dt=5qxWQ@mail.gmail.com
Attachments:
v01-0002-GIST-Fix-visibility-issues-in-IOS.patchapplication/octet-stream; name=v01-0002-GIST-Fix-visibility-issues-in-IOS.patchDownload
From fdc9c6ce6bff50b0dfde3960f800a455d835d3d0 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Fri, 25 Apr 2025 16:14:26 +0200
Subject: [PATCH v01 2/4] GIST: Fix visibility issues in IOS
Previously, GIST IOS could buffer tuples from pages while VACUUM came
along and cleaned up an ALL_DEAD tuple, marking the tuple's page
ALL_VISIBLE again and making IOS mistakenly believe the tuple is indeed
visible.
With this commit, buffer pins now conflict with GIST vacuum, and we now
do visibility checks on heap items returned from index pages while we
hold the pin, so that the IOS infrastructure knows to recheck the heap
page even if that heap page has become ALL_VISIBLE after we released the
index page.
Idea from Heikki Linnakangas
Backpatch: 17-
---
src/include/access/gist_private.h | 5 ++++
src/backend/access/gist/gistget.c | 36 ++++++++++++++++++++++++----
src/backend/access/gist/gistscan.c | 11 +++++++++
src/backend/access/gist/gistvacuum.c | 6 ++---
4 files changed, 50 insertions(+), 8 deletions(-)
diff --git a/src/include/access/gist_private.h b/src/include/access/gist_private.h
index 7b8749c8db0..78bee3a93f3 100644
--- a/src/include/access/gist_private.h
+++ b/src/include/access/gist_private.h
@@ -22,6 +22,7 @@
#include "storage/buffile.h"
#include "utils/hsearch.h"
#include "access/genam.h"
+#include "tableam.h"
/*
* Maximum number of "halves" a page can be split into in one operation.
@@ -124,6 +125,8 @@ typedef struct GISTSearchHeapItem
* index-only scans */
OffsetNumber offnum; /* track offset in page to mark tuple as
* LP_DEAD */
+ uint8 visrecheck; /* Cached visibility check result for this
+ * heap pointer. */
} GISTSearchHeapItem;
/* Unvisited item, either index page or heap tuple */
@@ -176,6 +179,8 @@ typedef struct GISTScanOpaqueData
OffsetNumber curPageData; /* next item to return */
MemoryContext pageDataCxt; /* context holding the fetched tuples, for
* index-only scans */
+ /* info used by Index-Only Scans */
+ Buffer vmbuf; /* reusable buffer for IOS' vm lookups */
} GISTScanOpaqueData;
typedef GISTScanOpaqueData *GISTScanOpaque;
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index b35b8a97577..8f6a680b010 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -462,8 +462,9 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem,
so->pageData[so->nPageData].offnum = i;
/*
- * In an index-only scan, also fetch the data from the tuple. The
- * reconstructed tuples are stored in pageDataCxt.
+ * In an index-only scan, also fetch the data from the tuple,
+ * and its visibility state. The reconstructed tuples are stored
+ * in pageDataCxt.
*/
if (scan->xs_want_itup)
{
@@ -471,6 +472,10 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem,
so->pageData[so->nPageData].recontup =
gistFetchTuple(giststate, r, it);
MemoryContextSwitchTo(oldcxt);
+
+ so->pageData[so->nPageData].visrecheck =
+ table_index_vischeck_tuple(scan->heapRelation,
+ &so->vmbuf, &it->t_tid);
}
so->nPageData++;
}
@@ -498,10 +503,16 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem,
item->data.heap.recheckDistances = recheck_distances;
/*
- * In an index-only scan, also fetch the data from the tuple.
+ * In an index-only scan, also fetch the data from the tuple,
+ * and its visibility state.
*/
if (scan->xs_want_itup)
+ {
item->data.heap.recontup = gistFetchTuple(giststate, r, it);
+ item->data.heap.visrecheck =
+ table_index_vischeck_tuple(scan->heapRelation,
+ &so->vmbuf, &it->t_tid);
+ }
}
else
{
@@ -586,9 +597,16 @@ getNextNearest(IndexScanDesc scan)
item->distances,
item->data.heap.recheckDistances);
- /* in an index-only scan, also return the reconstructed tuple. */
+ /*
+ * In an index-only scan, also return the reconstructed tuple
+ * and the visibility check we did when we still had a pin on
+ * the index page.
+ */
if (scan->xs_want_itup)
+ {
scan->xs_hitup = item->data.heap.recontup;
+ scan->xs_visrecheck = item->data.heap.visrecheck;
+ }
res = true;
}
else
@@ -671,9 +689,17 @@ gistgettuple(IndexScanDesc scan, ScanDirection dir)
scan->xs_heaptid = so->pageData[so->curPageData].heapPtr;
scan->xs_recheck = so->pageData[so->curPageData].recheck;
- /* in an index-only scan, also return the reconstructed tuple */
+ /*
+ * In an index-only scan, also return the reconstructed tuple
+ * and the visibility check we did when we still had a pin on
+ * the index page.
+ */
if (scan->xs_want_itup)
+ {
scan->xs_hitup = so->pageData[so->curPageData].recontup;
+ scan->xs_visrecheck =
+ so->pageData[so->curPageData].visrecheck;
+ }
so->curPageData++;
diff --git a/src/backend/access/gist/gistscan.c b/src/backend/access/gist/gistscan.c
index e05801e2f5b..97fecfcb1b2 100644
--- a/src/backend/access/gist/gistscan.c
+++ b/src/backend/access/gist/gistscan.c
@@ -341,6 +341,12 @@ gistrescan(IndexScanDesc scan, ScanKey key, int nkeys,
pfree(fn_extras);
}
+ if (BufferIsValid(so->vmbuf))
+ {
+ ReleaseBuffer(so->vmbuf);
+ so->vmbuf = InvalidBuffer;
+ }
+
/* any previous xs_hitup will have been pfree'd in context resets above */
scan->xs_hitup = NULL;
}
@@ -349,6 +355,11 @@ void
gistendscan(IndexScanDesc scan)
{
GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
+ if (BufferIsValid(so->vmbuf))
+ {
+ ReleaseBuffer(so->vmbuf);
+ so->vmbuf = InvalidBuffer;
+ }
/*
* freeGISTstate is enough to clean up everything made by gistbeginscan,
diff --git a/src/backend/access/gist/gistvacuum.c b/src/backend/access/gist/gistvacuum.c
index 24fb94f473e..e0da6e37dca 100644
--- a/src/backend/access/gist/gistvacuum.c
+++ b/src/backend/access/gist/gistvacuum.c
@@ -289,10 +289,10 @@ restart:
info->strategy);
/*
- * We are not going to stay here for a long time, aggressively grab an
- * exclusive lock.
+ * We are not going to stay here for a long time, aggressively grab a
+ * cleanup lock.
*/
- LockBuffer(buffer, GIST_EXCLUSIVE);
+ LockBufferForCleanup(buffer);
page = (Page) BufferGetPage(buffer);
if (gistPageRecyclable(page))
--
2.45.2
v01-0001-Expose-visibility-checking-shim-for-index-usage.patchapplication/octet-stream; name=v01-0001-Expose-visibility-checking-shim-for-index-usage.patchDownload
From 6f33846eae0417e46772f4cafe493c3f558c4ad1 Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Fri, 25 Apr 2025 14:57:06 +0200
Subject: [PATCH v01 1/4] Expose visibility checking shim for index usage
This will be the backbone of a fix for visibility issues in gist and
sp-gist's index-only scan code in later commits.
The issue to be solved is index scans caching tuples in memory that
are being removed by a concurrently running vacuum; to the point that
the VACUUM process has removed the tuple and marked its page as
ALL_VISIBLE by the time the IOS code returns that tuple through
amgettuple.
While the structure of IndexScanDesc is updated, this new field is
located in an alignment gap, thus making this new field safe to use.
---
src/include/access/relscan.h | 1 +
src/include/access/tableam.h | 53 +++++++++++++++
src/backend/access/index/indexam.c | 6 ++
src/backend/executor/nodeIndexonlyscan.c | 83 ++++++++++++++++--------
src/backend/utils/adt/selfuncs.c | 36 +++++++---
5 files changed, 143 insertions(+), 36 deletions(-)
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index 521043304ab..ba694ac4d1c 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -150,6 +150,7 @@ typedef struct IndexScanDescData
IndexFetchTableData *xs_heapfetch;
bool xs_recheck; /* T means scan keys must be rechecked */
+ uint8 xs_visrecheck; /* TMVC_Result; see tableam.h */
/*
* When fetching with an ordering operator, the values of the ORDER BY
diff --git a/src/include/access/tableam.h b/src/include/access/tableam.h
index 7be7887b4a8..000779ac984 100644
--- a/src/include/access/tableam.h
+++ b/src/include/access/tableam.h
@@ -24,6 +24,7 @@
#include "storage/read_stream.h"
#include "utils/rel.h"
#include "utils/snapshot.h"
+#include "visibilitymap.h"
#define DEFAULT_TABLE_ACCESS_METHOD "heap"
@@ -256,6 +257,28 @@ typedef struct TM_IndexDeleteOp
TM_IndexStatus *status;
} TM_IndexDeleteOp;
+/*
+ * Output of table_index_vischeck_tuple()
+ *
+ * Index-only scans need to know the visibility of the associated table tuples
+ * before they can return the index tuple. If the index tuple is known to be
+ * visible with a cheap check, we can return it directly without requesting
+ * the visibility info from the table AM directly.
+ *
+ * This AM API exposes a cheap visibility checking API to indexes, allowing
+ * these indexes to check multiple tuples worth of visibility info at once,
+ * and allowing the AM to store these checks, improving the pinning ergonomics
+ * of index AMs by allowing a scan to cache index tuples in memory without
+ * holding pins on those index tuples' pages until the index tuples were
+ * returned.
+ */
+typedef enum TMVC_Result
+{
+ TMVC_Unchecked,
+ TMVC_MaybeVisible,
+ TMVC_Visible,
+} TMVC_Result;
+
/* "options" flag bits for table_tuple_insert */
/* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
#define TABLE_INSERT_SKIP_FSM 0x0002
@@ -1359,6 +1382,36 @@ table_index_delete_tuples(Relation rel, TM_IndexDeleteOp *delstate)
return rel->rd_tableam->index_delete_tuples(rel, delstate);
}
+/*
+ * Determine rough visibility information of index tuples based on each TID.
+ *
+ * Determines which entries from index AM caller's TM_IndexVisibilityCheckOp
+ * state point to TMVC_VISIBLE or TMVC_MAYBE_VISIBLE table tuples, at low IO
+ * overhead. For the heap AM, the implementation is effectively a wrapper
+ * around VM_ALL_FROZEN.
+ *
+ * On return, all TM_VisChecks indicated by checkop->checktids will have been
+ * updated with the correct visibility status.
+ *
+ * Note that there is no value for "definitely dead" tuples, as the Heap AM
+ * doesn't have an efficient method to determine that a tuple is dead to all
+ * users, as it would have to go into the heap. If and when AMs are built
+ * that would support VM checks with an equivalent to VM_ALL_DEAD this
+ * decision can be reconsidered.
+ */
+static inline TMVC_Result
+table_index_vischeck_tuple(Relation rel, Buffer *vmbuffer, ItemPointer tid)
+{
+ BlockNumber blkno = ItemPointerGetBlockNumberNoCheck(tid);
+ TMVC_Result res;
+
+ if (VM_ALL_VISIBLE(rel, blkno, vmbuffer))
+ res = TMVC_Visible;
+ else
+ res = TMVC_MaybeVisible;
+
+ return res;
+}
/* ----------------------------------------------------------------------------
* Functions for manipulations of physical tuples.
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index dcd04b813d8..2e7e484066b 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -581,6 +581,12 @@ index_getnext_tid(IndexScanDesc scan, ScanDirection direction)
/* XXX: we should assert that a snapshot is pushed or registered */
Assert(TransactionIdIsValid(RecentXmin));
+ /*
+ * Reset xs_visrecheck, so we don't confuse the next tuple's visibility
+ * state with that of the previous.
+ */
+ scan->xs_visrecheck = TMVC_Unchecked;
+
/*
* The AM's amgettuple proc finds the next index entry matching the scan
* keys, and puts the TID into scan->xs_heaptid. It should also set
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index b49194c0167..18818a58f92 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -120,6 +120,7 @@ IndexOnlyNext(IndexOnlyScanState *node)
while ((tid = index_getnext_tid(scandesc, direction)) != NULL)
{
bool tuple_from_heap = false;
+ TMVC_Result vischeck = scandesc->xs_visrecheck;
CHECK_FOR_INTERRUPTS();
@@ -127,6 +128,9 @@ IndexOnlyNext(IndexOnlyScanState *node)
* We can skip the heap fetch if the TID references a heap page on
* which all tuples are known visible to everybody. In any case,
* we'll use the index tuple not the heap tuple as the data source.
+ * The index may have already pre-checked the visibility of the tuple
+ * for us, and stored the result in xs_visrecheck, in which case we
+ * can skip the call.
*
* Note on Memory Ordering Effects: visibilitymap_get_status does not
* lock the visibility map buffer, and therefore the result we read
@@ -156,37 +160,60 @@ IndexOnlyNext(IndexOnlyScanState *node)
*
* It's worth going through this complexity to avoid needing to lock
* the VM buffer, which could cause significant contention.
+ *
+ * The index doing these checks for us doesn't materially change these
+ * considerations.
*/
- if (!VM_ALL_VISIBLE(scandesc->heapRelation,
- ItemPointerGetBlockNumber(tid),
- &node->ioss_VMBuffer))
- {
- /*
- * Rats, we have to visit the heap to check visibility.
- */
- InstrCountTuples2(node, 1);
- if (!index_fetch_heap(scandesc, node->ioss_TableSlot))
- continue; /* no visible tuple, try next index entry */
+ if (vischeck == TMVC_Unchecked)
+ vischeck = table_index_vischeck_tuple(scandesc->heapRelation,
+ &node->ioss_VMBuffer,
+ tid);
- ExecClearTuple(node->ioss_TableSlot);
-
- /*
- * Only MVCC snapshots are supported here, so there should be no
- * need to keep following the HOT chain once a visible entry has
- * been found. If we did want to allow that, we'd need to keep
- * more state to remember not to call index_getnext_tid next time.
- */
- if (scandesc->xs_heap_continue)
- elog(ERROR, "non-MVCC snapshots are not supported in index-only scans");
+ Assert(vischeck != TMVC_Unchecked);
- /*
- * Note: at this point we are holding a pin on the heap page, as
- * recorded in scandesc->xs_cbuf. We could release that pin now,
- * but it's not clear whether it's a win to do so. The next index
- * entry might require a visit to the same heap page.
- */
-
- tuple_from_heap = true;
+ switch (vischeck)
+ {
+ case TMVC_Unchecked:
+ elog(ERROR, "Failed to check visibility for tuple");
+ /*
+ * In case of compilers that don't undertand that elog(ERROR)
+ * doens't exit, and which have -Wimplicit-fallthrough:
+ */
+ /* fallthrough */
+ case TMVC_MaybeVisible:
+ {
+ /*
+ * Rats, we have to visit the heap to check visibility.
+ */
+ InstrCountTuples2(node, 1);
+ if (!index_fetch_heap(scandesc, node->ioss_TableSlot))
+ continue; /* no visible tuple, try next index entry */
+
+ ExecClearTuple(node->ioss_TableSlot);
+
+ /*
+ * Only MVCC snapshots are supported here, so there should be
+ * no need to keep following the HOT chain once a visible
+ * entry has been found. If we did want to allow that, we'd
+ * need to keep more state to remember not to call
+ * index_getnext_tid next time.
+ */
+ if (scandesc->xs_heap_continue)
+ elog(ERROR, "non-MVCC snapshots are not supported in index-only scans");
+
+ /*
+ * Note: at this point we are holding a pin on the heap page,
+ * as recorded in scandesc->xs_cbuf. We could release that
+ * pin now, but it's not clear whether it's a win to do so.
+ * The next index entry might require a visit to the same heap
+ * page.
+ */
+
+ tuple_from_heap = true;
+ break;
+ }
+ case TMVC_Visible:
+ break;
}
/*
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index f420517961f..e608b6a9309 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -6364,15 +6364,29 @@ get_actual_variable_endpoint(Relation heapRel,
/* Set it up for index-only scan */
index_scan->xs_want_itup = true;
index_rescan(index_scan, scankeys, 1, NULL, 0);
+ index_scan->xs_visrecheck = TMVC_Unchecked;
/* Fetch first/next tuple in specified direction */
while ((tid = index_getnext_tid(index_scan, indexscandir)) != NULL)
{
BlockNumber block = ItemPointerGetBlockNumber(tid);
+ TMVC_Result visres = index_scan->xs_visrecheck;
- if (!VM_ALL_VISIBLE(heapRel,
- block,
- &vmbuffer))
+ if (visres == TMVC_Unchecked)
+ visres = table_index_vischeck_tuple(heapRel, &vmbuffer, tid);
+
+ Assert(visres != TMVC_Unchecked);
+
+ switch (visres)
+ {
+ case TMVC_Unchecked:
+ elog(ERROR, "Failed to check visibility for tuple");
+ /*
+ * In case of compilers that don't undertand that elog(ERROR)
+ * doens't exit, and which have -Wimplicit-fallthrough:
+ */
+ /* fallthrough */
+ case TMVC_MaybeVisible:
{
/* Rats, we have to visit the heap to check visibility */
if (!index_fetch_heap(index_scan, tableslot))
@@ -6382,10 +6396,11 @@ get_actual_variable_endpoint(Relation heapRel,
* advance to the next entry. Before doing so, count heap
* page fetches and give up if we've done too many.
*
- * We don't charge a page fetch if this is the same heap page
- * as the previous tuple. This is on the conservative side,
- * since other recently-accessed pages are probably still in
- * buffers too; but it's good enough for this heuristic.
+ * We don't charge a page fetch if this is the same heap
+ * page as the previous tuple. This is on the
+ * conservative side, since other recently-accessed pages
+ * are probably still in buffers too; but it's good enough
+ * for this heuristic.
*/
#define VISITED_PAGES_LIMIT 100
@@ -6394,7 +6409,7 @@ get_actual_variable_endpoint(Relation heapRel,
last_heap_block = block;
n_visited_heap_pages++;
if (n_visited_heap_pages > VISITED_PAGES_LIMIT)
- break;
+ goto exit_loop;
}
continue; /* no visible tuple, try next index entry */
@@ -6407,6 +6422,10 @@ get_actual_variable_endpoint(Relation heapRel,
* We don't care whether there's more than one visible tuple in
* the HOT chain; if any are visible, that's good enough.
*/
+ break;
+ }
+ case TMVC_Visible:
+ break;
}
/*
@@ -6435,6 +6454,7 @@ get_actual_variable_endpoint(Relation heapRel,
have_data = true;
break;
}
+exit_loop:
if (vmbuffer != InvalidBuffer)
ReleaseBuffer(vmbuffer);
--
2.45.2
v01-0003-SP-GIST-Fix-visibility-issues-in-IOS.patchapplication/octet-stream; name=v01-0003-SP-GIST-Fix-visibility-issues-in-IOS.patchDownload
From 9a6a6ca4f1877f0a112c63b6e47074fa15dd1d4b Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Fri, 25 Apr 2025 17:29:42 +0200
Subject: [PATCH v01 3/4] SP-GIST: Fix visibility issues in IOS
Previously, SP-GIST IOS could buffer tuples from pages while VACUUM came
along and cleaned up an ALL_DEAD tuple, marking the tuple's page
ALL_VISIBLE again and making IOS mistakenly believe the tuple is indeed
visible.
With this commit, buffer pins now conflict with SP-GIST vacuum, and we now
do visibility checks on heap items returned from index pages while we
hold the pin, so that the IOS infrastructure knows to recheck the heap
page even if that heap page has become ALL_VISIBLE after we released the
index page.
Idea from Heikki Linnakangas
Backpatch: 17-
---
src/include/access/spgist_private.h | 5 ++
src/backend/access/spgist/spgscan.c | 72 +++++++++++++++++++++------
src/backend/access/spgist/spgvacuum.c | 2 +-
3 files changed, 64 insertions(+), 15 deletions(-)
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index e7cbe10a89b..dd1dce660e8 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -21,6 +21,7 @@
#include "storage/buf.h"
#include "utils/geo_decls.h"
#include "utils/relcache.h"
+#include "tableam.h"
typedef struct SpGistOptions
@@ -175,6 +176,7 @@ typedef struct SpGistSearchItem
bool isLeaf; /* SearchItem is heap item */
bool recheck; /* qual recheck is needed */
bool recheckDistances; /* distance recheck is needed */
+ uint8 visrecheck; /* IOS: TMVC_Result of contained heap tuple */
/* array with numberOfOrderBys entries */
double distances[FLEXIBLE_ARRAY_MEMBER];
@@ -223,6 +225,7 @@ typedef struct SpGistScanOpaqueData
/* These fields are only used in amgettuple scans: */
bool want_itup; /* are we reconstructing tuples? */
+ Buffer vmbuf; /* IOS: used for table_index_vischeck_tuples */
TupleDesc reconTupDesc; /* if so, descriptor for reconstructed tuples */
int nPtrs; /* number of TIDs found on current page */
int iPtr; /* index for scanning through same */
@@ -231,6 +234,8 @@ typedef struct SpGistScanOpaqueData
bool recheckDistances[MaxIndexTuplesPerPage]; /* distance recheck
* flags */
HeapTuple reconTups[MaxIndexTuplesPerPage]; /* reconstructed tuples */
+ /* support for IOS */
+ uint8 *visrecheck; /* IOS vis check results, counted by nPtrs */
/* distances (for recheck) */
IndexOrderByDistance *distances[MaxIndexTuplesPerPage];
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 03293a7816e..f3e5d9bf868 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -30,7 +30,8 @@
typedef void (*storeRes_func) (SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isNull,
SpGistLeafTuple leafTuple, bool recheck,
- bool recheckDistances, double *distances);
+ bool recheckDistances, double *distances,
+ TMVC_Result vischeck);
/*
* Pairing heap comparison function for the SpGistSearchItem queue.
@@ -142,6 +143,7 @@ spgAddStartItem(SpGistScanOpaque so, bool isnull)
startEntry->traversalValue = NULL;
startEntry->recheck = false;
startEntry->recheckDistances = false;
+ startEntry->visrecheck = TMVC_Unchecked;
spgAddSearchItemToQueue(so, startEntry);
}
@@ -189,6 +191,11 @@ resetSpGistScanOpaque(SpGistScanOpaque so)
for (i = 0; i < so->nPtrs; i++)
pfree(so->reconTups[i]);
+
+ if (BufferIsValid(so->vmbuf))
+ ReleaseBuffer(so->vmbuf);
+
+ so->vmbuf = InvalidBuffer;
}
so->iPtr = so->nPtrs = 0;
}
@@ -387,6 +394,13 @@ spgrescan(IndexScanDesc scan, ScanKey scankey, int nscankeys,
memmove(scan->keyData, scankey,
scan->numberOfKeys * sizeof(ScanKeyData));
+ /* prepare index-only scan requirements */
+ if (scan->xs_want_itup)
+ {
+ if (so->visrecheck == NULL)
+ so->visrecheck = palloc(MaxIndexTuplesPerPage);
+ }
+
/* initialize order-by data if needed */
if (orderbys && scan->numberOfOrderBys > 0)
{
@@ -453,6 +467,9 @@ spgendscan(IndexScanDesc scan)
pfree(scan->xs_orderbynulls);
}
+ if (BufferIsValid(so->vmbuf))
+ ReleaseBuffer(so->vmbuf);
+
pfree(so);
}
@@ -502,6 +519,7 @@ spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
item->isLeaf = true;
item->recheck = recheck;
item->recheckDistances = recheckDistances;
+ item->visrecheck = TMVC_Unchecked;
return item;
}
@@ -515,7 +533,8 @@ spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
static bool
spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
SpGistLeafTuple leafTuple, bool isnull,
- bool *reportedSome, storeRes_func storeRes)
+ bool *reportedSome, storeRes_func storeRes,
+ Relation tableRel)
{
Datum leafValue;
double *distances;
@@ -571,6 +590,15 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
if (result)
{
+ TMVC_Result vischeck = TMVC_Unchecked;
+
+ if (so->want_itup)
+ {
+ Assert(PointerIsValid(so->items));
+ vischeck = table_index_vischeck_tuple(tableRel, &so->vmbuf,
+ &item->heapPtr);
+ }
+
/* item passes the scankeys */
if (so->numberOfNonNullOrderBys > 0)
{
@@ -583,6 +611,7 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
recheckDistances,
isnull,
distances);
+ heapItem->visrecheck = vischeck;
spgAddSearchItemToQueue(so, heapItem);
@@ -593,7 +622,7 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
/* non-ordered scan, so report the item right away */
Assert(!recheckDistances);
storeRes(so, &leafTuple->heapPtr, leafValue, isnull,
- leafTuple, recheck, false, NULL);
+ leafTuple, recheck, false, NULL, vischeck);
*reportedSome = true;
}
}
@@ -765,7 +794,8 @@ spgTestLeafTuple(SpGistScanOpaque so,
Page page, OffsetNumber offset,
bool isnull, bool isroot,
bool *reportedSome,
- storeRes_func storeRes)
+ storeRes_func storeRes,
+ Relation tablerel)
{
SpGistLeafTuple leafTuple = (SpGistLeafTuple)
PageGetItem(page, PageGetItemId(page, offset));
@@ -801,7 +831,8 @@ spgTestLeafTuple(SpGistScanOpaque so,
Assert(ItemPointerIsValid(&leafTuple->heapPtr));
- spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes);
+ spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes,
+ tablerel);
return SGLT_GET_NEXTOFFSET(leafTuple);
}
@@ -814,8 +845,8 @@ spgTestLeafTuple(SpGistScanOpaque so,
* next page boundary once we have reported at least one tuple.
*/
static void
-spgWalk(Relation index, SpGistScanOpaque so, bool scanWholeIndex,
- storeRes_func storeRes)
+spgWalk(Relation index, Relation table, SpGistScanOpaque so,
+ bool scanWholeIndex, storeRes_func storeRes)
{
Buffer buffer = InvalidBuffer;
bool reportedSome = false;
@@ -837,7 +868,8 @@ redirect:
Assert(so->numberOfNonNullOrderBys > 0);
storeRes(so, &item->heapPtr, item->value, item->isNull,
item->leafTuple, item->recheck,
- item->recheckDistances, item->distances);
+ item->recheckDistances, item->distances,
+ item->visrecheck);
reportedSome = true;
}
else
@@ -876,7 +908,8 @@ redirect:
for (offset = FirstOffsetNumber; offset <= max; offset++)
(void) spgTestLeafTuple(so, item, page, offset,
isnull, true,
- &reportedSome, storeRes);
+ &reportedSome, storeRes,
+ table);
}
else
{
@@ -886,7 +919,8 @@ redirect:
Assert(offset >= FirstOffsetNumber && offset <= max);
offset = spgTestLeafTuple(so, item, page, offset,
isnull, false,
- &reportedSome, storeRes);
+ &reportedSome, storeRes,
+ table);
if (offset == SpGistRedirectOffsetNumber)
goto redirect;
}
@@ -931,7 +965,8 @@ static void
storeBitmap(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull,
SpGistLeafTuple leafTuple, bool recheck,
- bool recheckDistances, double *distances)
+ bool recheckDistances, double *distances,
+ TMVC_Result vischeck)
{
Assert(!recheckDistances && !distances);
tbm_add_tuples(so->tbm, heapPtr, 1, recheck);
@@ -949,7 +984,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
so->tbm = tbm;
so->ntids = 0;
- spgWalk(scan->indexRelation, so, true, storeBitmap);
+ spgWalk(scan->indexRelation, scan->heapRelation, so, true, storeBitmap);
return so->ntids;
}
@@ -959,7 +994,8 @@ static void
storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull,
SpGistLeafTuple leafTuple, bool recheck,
- bool recheckDistances, double *nonNullDistances)
+ bool recheckDistances, double *nonNullDistances,
+ TMVC_Result vischeck)
{
Assert(so->nPtrs < MaxIndexTuplesPerPage);
so->heapPtrs[so->nPtrs] = *heapPtr;
@@ -1018,6 +1054,8 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
so->reconTups[so->nPtrs] = heap_form_tuple(so->reconTupDesc,
leafDatums,
leafIsnulls);
+
+ so->visrecheck[so->nPtrs] = vischeck;
}
so->nPtrs++;
}
@@ -1042,6 +1080,11 @@ spggettuple(IndexScanDesc scan, ScanDirection dir)
scan->xs_recheck = so->recheck[so->iPtr];
scan->xs_hitup = so->reconTups[so->iPtr];
+ if (so->want_itup)
+ scan->xs_visrecheck = so->visrecheck[so->iPtr];
+
+ Assert(!scan->xs_want_itup || scan->xs_visrecheck != TMVC_Unchecked);
+
if (so->numberOfOrderBys > 0)
index_store_float8_orderby_distances(scan, so->orderByTypes,
so->distances[so->iPtr],
@@ -1070,7 +1113,8 @@ spggettuple(IndexScanDesc scan, ScanDirection dir)
}
so->iPtr = so->nPtrs = 0;
- spgWalk(scan->indexRelation, so, false, storeGettuple);
+ spgWalk(scan->indexRelation, scan->heapRelation, so, false,
+ storeGettuple);
if (so->nPtrs == 0)
break; /* must have completed scan */
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index 0da069fd4d7..d0680a5073e 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -629,7 +629,7 @@ spgvacuumpage(spgBulkDeleteState *bds, BlockNumber blkno)
buffer = ReadBufferExtended(index, MAIN_FORKNUM, blkno,
RBM_NORMAL, bds->info->strategy);
- LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
+ LockBufferForCleanup(buffer);
page = (Page) BufferGetPage(buffer);
if (PageIsNew(page))
--
2.45.2
v01-0004-Test-for-IOS-Vacuum-race-conditions-in-index-AMs.patchapplication/octet-stream; name=v01-0004-Test-for-IOS-Vacuum-race-conditions-in-index-AMs.patchDownload
From 0d504eacaea91f1da5483b142e94b980456a711f Mon Sep 17 00:00:00 2001
From: Matthias van de Meent <boekewurm+postgres@gmail.com>
Date: Fri, 21 Mar 2025 16:41:31 +0100
Subject: [PATCH v01 4/4] Test for IOS/Vacuum race conditions in index AMs
Add regression tests that demonstrate wrong results can occur with index-only
scans in GiST and SP-GiST indexes when encountering tuples being removed by a
concurrent VACUUM operation.
With these tests the index AMs are also expected to not block VACUUM even when
they're used inside a cursor.
Reported-by: Peter Geoghegan <pg@bowt.ie>
Co-authored-by: Matthias van de Meent <boekewurm+postgres@gmail.com>
Co-authored-by: Peter Geoghegan <pg@bowt.ie>
Co-authored-by: Michail Nikolaev <michail.nikolaev@gmail.com>
Discussion: https://postgr.es/m/flat/CANtu0oi0rkR%2BFsgyLXnGZ-uW2950-urApAWLhy-%2BV1WJD%3D_ZXA%40mail.gmail.com
---
.../expected/index-only-scan-gist-vacuum.out | 53 +++++++
.../index-only-scan-spgist-vacuum.out | 53 +++++++
src/test/isolation/isolation_schedule | 2 +
.../specs/index-only-scan-gist-vacuum.spec | 112 ++++++++++++++
.../specs/index-only-scan-spgist-vacuum.spec | 112 ++++++++++++++
.../isolation/specs/vacuum-ios-conflicts.spec | 143 ++++++++++++++++++
6 files changed, 475 insertions(+)
create mode 100644 src/test/isolation/expected/index-only-scan-gist-vacuum.out
create mode 100644 src/test/isolation/expected/index-only-scan-spgist-vacuum.out
create mode 100644 src/test/isolation/specs/index-only-scan-gist-vacuum.spec
create mode 100644 src/test/isolation/specs/index-only-scan-spgist-vacuum.spec
create mode 100644 src/test/isolation/specs/vacuum-ios-conflicts.spec
diff --git a/src/test/isolation/expected/index-only-scan-gist-vacuum.out b/src/test/isolation/expected/index-only-scan-gist-vacuum.out
new file mode 100644
index 00000000000..b7c02ee9529
--- /dev/null
+++ b/src/test/isolation/expected/index-only-scan-gist-vacuum.out
@@ -0,0 +1,53 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s2_mod s1_begin s1_prepare_sorted s1_fetch_1 s2_vacuum s1_fetch_all s1_commit
+step s2_mod:
+ DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)';
+
+step s1_begin: BEGIN;
+step s1_prepare_sorted:
+ DECLARE foo NO SCROLL CURSOR FOR SELECT a <-> point '(0,0)' as x FROM ios_needs_cleanup_lock ORDER BY a <-> point '(0,0)';
+
+step s1_fetch_1:
+ FETCH FROM foo;
+
+ x
+------------------
+1.4142135623730951
+(1 row)
+
+step s2_vacuum: VACUUM (TRUNCATE false) ios_needs_cleanup_lock;
+step s1_fetch_all:
+ FETCH ALL FROM foo;
+
+x
+-
+(0 rows)
+
+step s1_commit: COMMIT;
+
+starting permutation: s2_mod s1_begin s1_prepare_unsorted s1_fetch_1 s2_vacuum s1_fetch_all s1_commit
+step s2_mod:
+ DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)';
+
+step s1_begin: BEGIN;
+step s1_prepare_unsorted:
+ DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE box '((-100,-100),(100,100))' @> a;
+
+step s1_fetch_1:
+ FETCH FROM foo;
+
+a
+-----
+(1,1)
+(1 row)
+
+step s2_vacuum: VACUUM (TRUNCATE false) ios_needs_cleanup_lock;
+step s1_fetch_all:
+ FETCH ALL FROM foo;
+
+a
+-
+(0 rows)
+
+step s1_commit: COMMIT;
diff --git a/src/test/isolation/expected/index-only-scan-spgist-vacuum.out b/src/test/isolation/expected/index-only-scan-spgist-vacuum.out
new file mode 100644
index 00000000000..b7c02ee9529
--- /dev/null
+++ b/src/test/isolation/expected/index-only-scan-spgist-vacuum.out
@@ -0,0 +1,53 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s2_mod s1_begin s1_prepare_sorted s1_fetch_1 s2_vacuum s1_fetch_all s1_commit
+step s2_mod:
+ DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)';
+
+step s1_begin: BEGIN;
+step s1_prepare_sorted:
+ DECLARE foo NO SCROLL CURSOR FOR SELECT a <-> point '(0,0)' as x FROM ios_needs_cleanup_lock ORDER BY a <-> point '(0,0)';
+
+step s1_fetch_1:
+ FETCH FROM foo;
+
+ x
+------------------
+1.4142135623730951
+(1 row)
+
+step s2_vacuum: VACUUM (TRUNCATE false) ios_needs_cleanup_lock;
+step s1_fetch_all:
+ FETCH ALL FROM foo;
+
+x
+-
+(0 rows)
+
+step s1_commit: COMMIT;
+
+starting permutation: s2_mod s1_begin s1_prepare_unsorted s1_fetch_1 s2_vacuum s1_fetch_all s1_commit
+step s2_mod:
+ DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)';
+
+step s1_begin: BEGIN;
+step s1_prepare_unsorted:
+ DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE box '((-100,-100),(100,100))' @> a;
+
+step s1_fetch_1:
+ FETCH FROM foo;
+
+a
+-----
+(1,1)
+(1 row)
+
+step s2_vacuum: VACUUM (TRUNCATE false) ios_needs_cleanup_lock;
+step s1_fetch_all:
+ FETCH ALL FROM foo;
+
+a
+-
+(0 rows)
+
+step s1_commit: COMMIT;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 143109aa4da..9720c9a2dc8 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -17,6 +17,8 @@ test: partial-index
test: two-ids
test: multiple-row-versions
test: index-only-scan
+test: index-only-scan-gist-vacuum
+test: index-only-scan-spgist-vacuum
test: predicate-lock-hot-tuple
test: update-conflict-out
test: deadlock-simple
diff --git a/src/test/isolation/specs/index-only-scan-gist-vacuum.spec b/src/test/isolation/specs/index-only-scan-gist-vacuum.spec
new file mode 100644
index 00000000000..9d241b25920
--- /dev/null
+++ b/src/test/isolation/specs/index-only-scan-gist-vacuum.spec
@@ -0,0 +1,112 @@
+# index-only-scan test showing wrong results with GiST
+#
+setup
+{
+ -- by using a low fillfactor and a wide tuple we can get multiple blocks
+ -- with just few rows
+ CREATE TABLE ios_needs_cleanup_lock (a point NOT NULL, b int not null, pad char(1024) default '')
+ WITH (AUTOVACUUM_ENABLED = false, FILLFACTOR = 10);
+
+ INSERT INTO ios_needs_cleanup_lock SELECT point(g.i, g.i), g.i FROM generate_series(1, 10) g(i);
+
+ CREATE INDEX ios_spgist_a ON ios_needs_cleanup_lock USING gist(a);
+}
+setup
+{
+ VACUUM (ANALYZE) ios_needs_cleanup_lock;
+}
+
+teardown
+{
+ DROP TABLE ios_needs_cleanup_lock;
+}
+
+
+session s1
+
+# Force an index-only scan, where possible:
+setup {
+ SET enable_bitmapscan = false;
+ SET enable_indexonlyscan = true;
+ SET enable_indexscan = true;
+}
+
+step s1_begin { BEGIN; }
+step s1_commit { COMMIT; }
+
+step s1_prepare_sorted {
+ DECLARE foo NO SCROLL CURSOR FOR SELECT a <-> point '(0,0)' as x FROM ios_needs_cleanup_lock ORDER BY a <-> point '(0,0)';
+}
+
+step s1_prepare_unsorted {
+ DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE box '((-100,-100),(100,100))' @> a;
+}
+
+step s1_fetch_1 {
+ FETCH FROM foo;
+}
+
+step s1_fetch_all {
+ FETCH ALL FROM foo;
+}
+
+
+session s2
+
+# Don't delete row 1 so we have a row for the cursor to "rest" on.
+step s2_mod
+{
+ DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)';
+}
+
+# Disable truncation, as otherwise we'll just wait for a timeout while trying
+# to acquire the lock
+step s2_vacuum { VACUUM (TRUNCATE false) ios_needs_cleanup_lock; }
+
+permutation
+ # delete nearly all rows, to make issue visible
+ s2_mod
+ # create a cursor
+ s1_begin
+ s1_prepare_sorted
+
+ # fetch one row from the cursor, that ensures the index scan portion is done
+ # before the vacuum in the next step
+ s1_fetch_1
+
+ # with the bug this vacuum will mark pages as all-visible that the scan in
+ # the next step then considers all-visible, despite all rows from those
+ # pages having been removed.
+ # Because this should block on buffer-level locks, this won't ever be
+ # considered "blocked" by isolation tester, and so we only have a single
+ # step we can work with concurrently.
+ s2_vacuum
+
+ # if this returns any rows, we're busted
+ s1_fetch_all
+
+ s1_commit
+
+permutation
+ # delete nearly all rows, to make issue visible
+ s2_mod
+ # create a cursor
+ s1_begin
+ s1_prepare_unsorted
+
+ # fetch one row from the cursor, that ensures the index scan portion is done
+ # before the vacuum in the next step
+ s1_fetch_1
+
+ # with the bug this vacuum will mark pages as all-visible that the scan in
+ # the next step then considers all-visible, despite all rows from those
+ # pages having been removed.
+ # Because this should block on buffer-level locks, this won't ever be
+ # considered "blocked" by isolation tester, and so we only have a single
+ # step we can work with concurrently.
+ s2_vacuum
+
+ # if this returns any rows, we're busted
+ s1_fetch_all
+
+ s1_commit
diff --git a/src/test/isolation/specs/index-only-scan-spgist-vacuum.spec b/src/test/isolation/specs/index-only-scan-spgist-vacuum.spec
new file mode 100644
index 00000000000..cd621d4f7f2
--- /dev/null
+++ b/src/test/isolation/specs/index-only-scan-spgist-vacuum.spec
@@ -0,0 +1,112 @@
+# index-only-scan test showing wrong results with SPGiST
+#
+setup
+{
+ -- by using a low fillfactor and a wide tuple we can get multiple blocks
+ -- with just few rows
+ CREATE TABLE ios_needs_cleanup_lock (a point NOT NULL, b int not null, pad char(1024) default '')
+ WITH (AUTOVACUUM_ENABLED = false, FILLFACTOR = 10);
+
+ INSERT INTO ios_needs_cleanup_lock SELECT point(g.i, g.i), g.i FROM generate_series(1, 10) g(i);
+
+ CREATE INDEX ios_spgist_a ON ios_needs_cleanup_lock USING spgist(a);
+}
+setup
+{
+ VACUUM (ANALYZE) ios_needs_cleanup_lock;
+}
+
+teardown
+{
+ DROP TABLE ios_needs_cleanup_lock;
+}
+
+
+session s1
+
+# Force an index-only scan, where possible:
+setup {
+ SET enable_bitmapscan = false;
+ SET enable_indexonlyscan = true;
+ SET enable_indexscan = true;
+}
+
+step s1_begin { BEGIN; }
+step s1_commit { COMMIT; }
+
+step s1_prepare_sorted {
+ DECLARE foo NO SCROLL CURSOR FOR SELECT a <-> point '(0,0)' as x FROM ios_needs_cleanup_lock ORDER BY a <-> point '(0,0)';
+}
+
+step s1_prepare_unsorted {
+ DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE box '((-100,-100),(100,100))' @> a;
+}
+
+step s1_fetch_1 {
+ FETCH FROM foo;
+}
+
+step s1_fetch_all {
+ FETCH ALL FROM foo;
+}
+
+
+session s2
+
+# Don't delete row 1 so we have a row for the cursor to "rest" on.
+step s2_mod
+{
+ DELETE FROM ios_needs_cleanup_lock WHERE a != point '(1,1)';
+}
+
+# Disable truncation, as otherwise we'll just wait for a timeout while trying
+# to acquire the lock
+step s2_vacuum { VACUUM (TRUNCATE false) ios_needs_cleanup_lock; }
+
+permutation
+ # delete nearly all rows, to make issue visible
+ s2_mod
+ # create a cursor
+ s1_begin
+ s1_prepare_sorted
+
+ # fetch one row from the cursor, that ensures the index scan portion is done
+ # before the vacuum in the next step
+ s1_fetch_1
+
+ # with the bug this vacuum will mark pages as all-visible that the scan in
+ # the next step then considers all-visible, despite all rows from those
+ # pages having been removed.
+ # Because this should block on buffer-level locks, this won't ever be
+ # considered "blocked" by isolation tester, and so we only have a single
+ # step we can work with concurrently.
+ s2_vacuum
+
+ # if this returns any rows, we're busted
+ s1_fetch_all
+
+ s1_commit
+
+permutation
+ # delete nearly all rows, to make issue visible
+ s2_mod
+ # create a cursor
+ s1_begin
+ s1_prepare_unsorted
+
+ # fetch one row from the cursor, that ensures the index scan portion is done
+ # before the vacuum in the next step
+ s1_fetch_1
+
+ # with the bug this vacuum will mark pages as all-visible that the scan in
+ # the next step then considers all-visible, despite all rows from those
+ # pages having been removed.
+ # Because this should block on buffer-level locks, this won't ever be
+ # considered "blocked" by isolation tester, and so we only have a single
+ # step we can work with concurrently.
+ s2_vacuum
+
+ # if this returns any rows, we're busted
+ s1_fetch_all
+
+ s1_commit
diff --git a/src/test/isolation/specs/vacuum-ios-conflicts.spec b/src/test/isolation/specs/vacuum-ios-conflicts.spec
new file mode 100644
index 00000000000..1944d9adad1
--- /dev/null
+++ b/src/test/isolation/specs/vacuum-ios-conflicts.spec
@@ -0,0 +1,143 @@
+# index-only-scan test showing correct locking interactions with various
+# index types:
+# - btree (forward, backward),
+# - GIST (ordered, unordered), and
+# - SP-GIST (ordered, unordered)
+setup
+{
+ -- by using a low fillfactor and a wide tuple we can get multiple blocks
+ -- with just few rows
+ CREATE TABLE ios_needs_cleanup_lock (a int NOT NULL, b int not null, pad char(1024) default '')
+ WITH (AUTOVACUUM_ENABLED = false, FILLFACTOR = 10);
+
+ INSERT INTO ios_needs_cleanup_lock SELECT g.i, g.i FROM generate_series(1, 10) g(i);
+}
+
+teardown
+{
+ DROP TABLE ios_needs_cleanup_lock;
+}
+
+
+session s1
+
+# Force an index-only scan, where possible:
+setup {
+ SET enable_bitmapscan = false;
+ SET enable_indexonlyscan = true;
+ SET enable_indexscan = true;
+}
+
+step s1_begin { BEGIN; }
+step s1_commit { COMMIT; }
+
+# btree
+step s1_index_btree {
+ CREATE INDEX ios_vacuum_conflict_a ON ios_needs_cleanup_lock USING btree (a);
+}
+step s1_prepare_btree_forward {
+ DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock ORDER BY a ASC;
+}
+step s1_prepare_btree_backward {
+ DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock ORDER BY a DESC;
+}
+
+# GIST
+step s1_index_gist {
+ CREATE INDEX ios_vacuum_conflict_a ON ios_needs_cleanup_lock USING gist(a);
+}
+step s1_prepare_gist_sorted {
+ DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE a > 0 ORDER BY a <-> 0;
+}
+step s1_prepare_gist_unsorted {
+ DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE a > 0;
+}
+
+# GIST
+step s1_index_spg {
+ CREATE INDEX ios_vacuum_conflict_a ON ios_needs_cleanup_lock USING spgist(a);
+}
+step s1_prepare_spg_sorted {
+ DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE a > 0 ORDER BY a <-> 0;
+}
+step s1_prepare_spg_unsorted {
+ DECLARE foo NO SCROLL CURSOR FOR SELECT a FROM ios_needs_cleanup_lock WHERE a > 0;
+}
+
+step s1_fetch_1 {
+ FETCH FROM foo;
+}
+
+step s1_fetch_all {
+ SELECT pg_sleep_for(INTERVAL '50ms');
+ FETCH ALL FROM foo;
+}
+
+
+session s2
+
+# Don't delete row 1 so we have a row for the cursor to "rest" on.
+step s2_mod
+{
+ DELETE FROM ios_needs_cleanup_lock WHERE a > 1;
+}
+
+# Disable truncation, as otherwise we'll just wait for a timeout while trying
+# to acquire the lock
+step s2_vacuum { VACUUM (TRUNCATE false) ios_needs_cleanup_lock; }
+
+permutation
+ # Vacuum first, to ensure VM exists, otherwise the bitmapscan will consider
+ # VM to be size 0, due to caching. Can't do that in setup because
+ s2_vacuum
+
+ # delete nearly all rows, to make issue visible
+ s2_mod
+ # create a cursor
+ s1_begin
+ s1_prepare_sorted
+
+ # fetch one row from the cursor, that ensures the index scan portion is done
+ # before the vacuum in the next step
+ s1_fetch_1
+
+ # with the bug this vacuum will mark pages as all-visible that the scan in
+ # the next step then considers all-visible, despite all rows from those
+ # pages having been removed.
+ # Because this should block on buffer-level locks, this won't ever be
+ # considered "blocked" by isolation tester, and so we only have a single
+ # step we can work with concurrently.
+ s2_vacuum (*)
+
+ # if this returns any rows, we're busted
+ s1_fetch_all
+
+ s1_commit
+
+permutation
+ # Vacuum first, to ensure VM exists, otherwise the bitmapscan will consider
+ # VM to be size 0, due to caching. Can't do that in setup because
+ s2_vacuum
+
+ # delete nearly all rows, to make issue visible
+ s2_mod
+ # create a cursor
+ s1_begin
+ s1_prepare_unsorted
+
+ # fetch one row from the cursor, that ensures the index scan portion is done
+ # before the vacuum in the next step
+ s1_fetch_1
+
+ # with the bug this vacuum will mark pages as all-visible that the scan in
+ # the next step then considers all-visible, despite all rows from those
+ # pages having been removed.
+ # Because this should block on buffer-level locks, this won't ever be
+ # considered "blocked" by isolation tester, and so we only have a single
+ # step we can work with concurrently.
+ s2_vacuum (*)
+
+ # if this returns any rows, we're busted
+ s1_fetch_all
+
+ s1_commit
--
2.45.2