Multi-Entry Indexing for GiST & SP-GiST

Started by Maxime Schoemans2 months ago8 messageshackers

maxime.schoemans@enterprisedb.com

2 months ago

Hi hackers,

This patch set adds multi-entry support to the GiST and SP-GiST access
methods, allowing a single heap tuple to produce multiple index entries --
similar to how GIN decomposes values via extractValue, but within the
R-tree (GiST) and quad-tree (SP-GiST) frameworks.

I presented the concept and an extension-based prototype (MEST) at
PGConf.dev 2024 [1]https://www.pgevents.ca/events/pgconfdev2024/schedule/session/171-multi-entry-generalized-search-trees/. This patch set integrates the feature directly into
the GiST and SP-GiST access methods rather than maintaining a separate
forked AM.

Motivation
----------

The existing GiST multirange opclass compresses a multirange into its
bounding union range, losing information about gaps. This makes operators
like @> and <@ imprecise at the index level, requiring many false-positive
rechecks. Multi-entry GiST instead stores each component range as a
separate index entry, giving the R-tree much tighter bounds and
significantly reducing rechecks for multiranges with gaps.

SP-GiST currently has no multirange opclass at all. The multi-entry
infrastructure lets us add one: multiranges are decomposed into component
ranges and indexed using the existing range quad-tree, providing SP-GiST
multirange support for the first time.

The approach generalizes beyond multiranges: any composite data type that
can be meaningfully decomposed into simpler sub-entries can benefit (e.g.,
arrays of geometric types, route geometries).

Design
------

A new optional support function, extractValue, is added to both GiST
(proc 13) and SP-GiST (proc 8). When an opclass provides it, the insert
and build paths call it to decompose each datum into multiple sub-entries,
each stored as a separate index tuple pointing to the same heap TID.
The signature mirrors GIN's:

Datum *extractValue(Datum value, int32 *nentries, bool **nullFlags)

The opclass returns a palloc'd array of sub-entries (typed as the index
key column's type), sets *nentries, and optionally fills *nullFlags.
If it returns zero entries for a non-NULL input, the AM falls back to a
single NULL index entry (matching only IS NULL); opclasses that want
such values to remain visible to operator queries should return a
sentinel instead -- e.g., multirange_me_ops stores an empty range for
empty multiranges.

On the scan side, a simplehash-based TID hash deduplicates results so
each heap tuple is returned only once. For ordered (KNN) scans the
same hash is also consulted before enqueuing leaf items as an early
filter, but the dedup insert happens at dequeue time so the pairing
heap can pick the copy with the smallest distance.

Semantics for the opclass author:

- Leaf consistent functions see one sub-entry at a time, not the full
value, so most strategies must set recheck=true. Strategies that
are exact per-component (for multiranges: OVERLAPS and
CONTAINS_ELEM) can skip it.

- Internal-node keys are unions of sub-entries drawn from many heap
tuples, so containment-style strategies (CONTAINS, EQ) must relax
to OVERLAPS during descent: a matching value's components may live
in multiple subtrees, and requiring the union key to fully contain
the query would prune valid ones.

Other design decisions:

- extractValue is fully optional; existing opclasses are unaffected.
- Multi-entry is restricted to single-key-column indexes (INCLUDE
columns are fine); multi-column support can be added later.
- Index-only scans are disabled on a multi-entry key column, since
the stored sub-entries do not represent the original datum.
- For SP-GiST, compress becomes optional when extractValue is present:
extractValue produces the leaf-typed values directly, and leafType
may differ from the input type (e.g., anymultirange -> anyrange).
- Since leaf consistent functions see components, a separate opclass
(multirange_me_ops) is provided alongside the existing multirange_ops.

Patch overview
--------------

0001 - GiST AM infrastructure (gist.h, gist_private.h, gist.c,
gistbuild.c, gistget.c, gistscan.c, gistutil.c, gistvalidate.c): new
extractValue support function (proc 13), multi-entry insert/build paths,
TID deduplication during scans using simplehash.

0002 - Built-in GiST multirange_me_ops opclass (rangetypes_gist.c, catalog
files): decomposes multiranges into component ranges, with multi-entry
consistent functions. Empty multiranges are stored as empty range
sentinels to remain visible to operator queries. Regression tests verify
index correctness against sequential scan results for all operators across
index scan and bitmap scan, including empty range/multirange edge cases,
multi-column restriction, and NULL handling.

0003 - SP-GiST AM infrastructure (spgist.h, spgist_private.h, spginsert.c,
spgscan.c, spgvalidate.c, spgutils.c): new extractValue support function
(proc 8), multi-entry insert/build paths, TID deduplication during scans
using simplehash, compress made optional when extractValue is present.

0004 - Built-in SP-GiST multirange_me_ops opclass (rangetypes_spgist.c,
catalog files): reuses the range quad-tree structure with new config,
inner consistent, and leaf consistent functions that handle range,
multirange, and element query types. Marked non-default, consistent
with the GiST multirange_me_ops. Regression tests verify index
correctness against sequential scan results for all operators across
index scan and bitmap scan.

Quick benchmark
---------------

100k multiranges with wide gaps: {[g, g+10), [g+100000, g+100010)}.
Query: mr @> 100000 (matches 10 rows; value falls in the gap for most).

CREATE TABLE bench_mr (mr int4multirange);
INSERT INTO bench_mr
SELECT int4multirange(int4range(g, g+10), int4range(g+100000, g+100010))
FROM generate_series(1, 100000) g;

Method Exec time Buffers Recheck
Sequential scan 7.732 ms 834 -
GiST multirange_ops (std) 9.504 ms 2311 99990
GiST multirange_me_ops 0.056 ms 6 0
SP-GiST multirange_me_ops 0.112 ms 27 0

This is a worst case for the standard opclass: the wide gap between
components produces a bounding range [g, g+100010) that covers the query
point for nearly every row, requiring 99990 heap rechecks and making it
slower than a sequential scan. Multi-entry indexes store each component
range separately, eliminating all false positives. The improvement grows
with gap width; for multiranges with little or no gap the standard
opclass performs comparably.

[1]: https://www.pgevents.ca/events/pgconfdev2024/schedule/session/171-multi-entry-generalized-search-trees/

--
Maxime Schoemans

Maxime Schoemans

maxime.schoemans@enterprisedb.com

2 months ago

In reply to: Maxime Schoemans (#1)

Re: Multi-Entry Indexing for GiST & SP-GiST

This time with the patches attached.

Best,
Maxime

Andrey Borodin

amborodin@acm.org

about 2 months ago

In reply to: Maxime Schoemans (#2)

Re: Multi-Entry Indexing for GiST & SP-GiST

On 21 May 2026, at 22:34, Maxime Schoemans <maxime.schoemans@enterprisedb.com> wrote:

patches attached

Hi Maxime,

I have been reading through the patch set. I will focus on the GiST side
here - I know the SP-GiST internals far less well. So I would rather
discuss the architecture where I can actually be useful.

Skipping dedup for non-duplicated entries
------------------------------------------

On the scan path, once an opclass has extractValue, every leaf entry
goes through the TID hash even when the indexed value produced a single
sub-entry and therefore cannot collide. GiST scans are CPU-bound (we
examine every tuple on the page and run consistent on each), so this
probe lands on the hot path rather than being hidden behind I/O.

Since multi-entry is gated on a new, non-default opclass, no existing
index takes this path, so the leaf format for these opclasses is
effectively new and free to extend. INDEX_AM_RESERVED_BIT (0x2000 in
t_info) is reserved for exactly such stuff and is currently unused anywhere
in the backend. We could set it at insert/build time only when extractValue
returns nentries > 1, and skip the hash on scan for entries without the
bit; the hash then grows only with genuinely multiplied TIDs. I am not
proposing it as a must, just noting the format is new enough to allow it.

One related concern: I am not a big fan of the single-key-column
restriction. Features like this should be orthogonal to the rest of the
AM, and "throws an error on more than one column" tends to calcify into a
permanent limitation rather than a temporary one.

BTW sorting build ignores extract_value. But that's kinda not important at
current stage.

extractValue == new compress
----------------------------

What strikes me in the catalog is that multirange_me_ops drops the
compress support proc (3) and adds extractValue (13), while multirange_ops
is the reverse. So extractValue already supplants compress here: it emits
leaf-typed values directly. Conceptually compress is just extractValue
constrained to nentries == 1, and the SP-GiST side already makes compress
optional when extractValue is present, which points at the same overlap.

Was unifying the two considered, rather than carrying two parallel
support procs? For example a single "produce leaf entries" entry point,
with a 1->1 shim over compress for the existing opclasses. That would
keep the insert/build path single rather than branching on whether
extractValue exists, and it would frame multi-entry as a generalization
of what compress already does rather than a parallel mechanism.

Is this useful to PostGIS?
--------------------------

The motivation that matters most to me is whether the real heavy users of
GiST will adopt this. Multiranges are a fairly narrow audience on their
own; the compelling case is multi-part geometries (MultiPolygon with
holes, routes, regions with exclaves), which is PostGIS territory.

I am adding Darafei and Paul to CC - it would be very helpful to
hear whether PostGIS would actually use extractValue in their GiST
opclasses, and whether the single-column restriction or the per-entry
dedup cost would be a problem in practice for them. If the GIS side is
on board, the feature is clearly worth itю If not, it is worth knowing
that when designing the AM-level machinery.

Best regards, Andrey Borodin.

Matthias van de Meent

boekewurm+postgres@gmail.com

about 2 months ago

In reply to: Maxime Schoemans (#1)

Re: Multi-Entry Indexing for GiST & SP-GiST

On Wed, 20 May 2026 at 10:45, Maxime Schoemans
<maxime.schoemans@enterprisedb.com> wrote:

Hi hackers,

This patch set adds multi-entry support to the GiST and SP-GiST access
methods, allowing a single heap tuple to produce multiple index entries --
similar to how GIN decomposes values via extractValue, but within the
R-tree (GiST) and quad-tree (SP-GiST) frameworks.

I think the attachments were lost in transmission.

Motivation
----------

The existing GiST multirange opclass compresses a multirange into its
bounding union range, losing information about gaps. This makes operators
like @> and <@ imprecise at the index level, requiring many false-positive
rechecks. Multi-entry GiST instead stores each component range as a
separate index entry, giving the R-tree much tighter bounds and
significantly reducing rechecks for multiranges with gaps.

Cool! I'm not sure this is preferable over a specialized AM
(comparable to what GIN is to btree), but cool!

Other design decisions:

[...]

- Index-only scans are disabled on a multi-entry key column, since
the stored sub-entries do not represent the original datum.

How is this done? Surely it's not through a planner check of GiST opclasses?

- For SP-GiST, compress becomes optional when extractValue is present:
extractValue produces the leaf-typed values directly, and leafType
may differ from the input type (e.g., anymultirange -> anyrange).

I don't think it's generally safe to remove compression even when
extractValue is present.

- Since leaf consistent functions see components, a separate opclass
(multirange_me_ops) is provided alongside the existing multirange_ops.

Patch overview
--------------

0001 - GiST AM infrastructure (gist.h, gist_private.h, gist.c,
gistbuild.c, gistget.c, gistscan.c, gistutil.c, gistvalidate.c): new
extractValue support function (proc 13), multi-entry insert/build paths,
TID deduplication during scans using simplehash.

Could you help me understand how this deduplication works? Wouldn't
this possibly use ~ unbounded memory (or however much memory is needed
to store up to 2^48 TIDs)?

GIN gets away with it because it uses a lossy Bitmap structure, and
because it traverses the TID space linearly, but we can't use lossy
checks for amgettuple() without losing correctness, and I don't think
GiST can do the same.

Kind regards,

Matthias van de Meent

Andrey Borodin

amborodin@acm.org

about 2 months ago

In reply to: Matthias van de Meent (#4)

Re: Multi-Entry Indexing for GiST & SP-GiST

On 1 Jun 2026, at 20:35, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:

Could you help me understand how this deduplication works? Wouldn't
this possibly use ~ unbounded memory (or however much memory is needed
to store up to 2^48 TIDs)?

For practical IndexScan this should not matter much.

For impractical we can guarantee result by unique tuplestore when hashtable\ART is over
WORK_MEM.

Best regards, Andrey Borodin.

Maxime Schoemans

maxime.schoemans@enterprisedb.com

about 2 months ago

In reply to: Andrey Borodin (#5)

Re: Multi-Entry Indexing for GiST & SP-GiST

Hi Andrey, hi Matthias,

Thank you both for the reviews. A lot of the points overlap, so I'm
answering both here and grouping by topic instead of by message. v2 is
attached, with the changes I describe below. The first four patches are
the proposal. The last two are a separate suggestion for the sorted build,
which I'd be happy to drop if you'd rather leave it for later.

-- Multi-entry in core vs. a specialized AM

Matthias:

I'm not sure this is preferable over a specialized AM (comparable to
what GIN is to btree), but cool!

The prototype I presented at PGConf.dev 2024 (MEST) was a separate forked
AM, so I did start there. The trouble is that multi-entry as it stands adds
no new on-disk structure. It reuses the existing GiST R-tree (and SP-GiST
quad-tree) unchanged and just feeds it decomposed entries, so a separate AM
would duplicate almost all of GiST/SP-GiST with little of its own. GIN can
justify its own AM because it adds posting lists on top of a btree.
Multi-entry could grow something comparable one day (merging near-duplicate
entries, with recheck), but that is a much larger change and nothing here
needs it. extractValue as an optional support proc reuses all the existing
machinery, is opt-in per opclass, and costs existing opclasses nothing.

Moving it into core also helps adoption. Heavy GiST users like PostGIS are
more likely to add an extractValue function to their opclasses than to
depend on an out-of-tree AM.

-- extractValue vs. compress

Andrey:

Conceptually compress is just extractValue constrained to nentries == 1
[...] Was unifying the two considered, rather than carrying two parallel
support procs?

Matthias:

I don't think it's generally safe to remove compression even when
extractValue is present.

Two reasons I kept them separate.

First, unifying them would mean changing compress's signature, which is
strictly 1->1 in both AMs (GiST GISTENTRY* -> GISTENTRY*, SP-GiST input
-> leaf), so every existing opclass with a compress function would break.

Second, they do two different jobs and they compose. extractValue splits a
value into leaf entries, and compress decides how each one is stored. When
an opclass has both, the insert/build path runs every extracted entry
through the normal tuple-forming path, so compress still runs per entry and
is not dropped when present (Matthias). We only omit it when the split
already produces leaf-typed values, as multirange_me_ops does. It returns
ranges directly, so there is nothing for compress to do. An opclass that
wanted both could decompose with extractValue and compress each part. A
GeometryCollection opclass, for example, could have extractValue return the
member geometries, and compress reduce each one to its bounding box for
storage.

-- TID deduplication and scan memory

Matthias:

Could you help me understand how this deduplication works? Wouldn't
this possibly use ~ unbounded memory [...]? [...] we can't use lossy
checks for amgettuple() without losing correctness

Andrey:

[...] set INDEX_AM_RESERVED_BIT [...] only when extractValue returns
nentries > 1, and skip the hash on scan for entries without the bit

For bitmap scans there is no extra memory, the TID bitmap deduplicates on
its own. The hash is only used for amgettuple (plain and KNN ordered
scans), and you are right that it cannot go lossy without breaking
amgettuple, so the GIN trick does not carry over.

Two things keep it bounded. v2 adopts Andrey's reserved-bit idea, so the
hash only ever holds TIDs that extractValue actually multiplied (the bit is
set only when nentries > 1), and single-entry values skip it entirely. And
the planner only chooses amgettuple when it expects few matching rows. A
query that would return many rows gets a bitmap scan instead, where the TID
bitmap deduplicates for free, so the hash only grows when the result is
already expected to be small. As Andrey said, for practical scans this
should not be a concern.

For the worst case there are ways to bound it, like Andrey's suggestion of
spilling to a unique tuplestore once the hash goes over work_mem. v2 does
not do this yet. Matthias, are you thinking along those lines, or did you
have another idea in mind? And does it need handling before this goes in,
or is it fine as a follow-up?

-- Index-only scans

Matthias:

[Index-only scans are disabled on a multi-entry key column.] How is
this done? Surely it's not through a planner check of GiST opclasses?

No planner check. It's amcanreturn (gistcanreturn), which returns false
for a key column that has an extractValue proc, since the stored
sub-entries don't represent the original datum. INCLUDE columns stay
returnable.

-- Single-key-column restriction

Andrey:

I am not a big fan of the single-key-column restriction [...] "throws an
error on more than one column" tends to calcify into a permanent
limitation

Fair point. The restriction is not fundamental, I just wanted to keep the
first version simple. v2 lifts it. It allows several key columns as long
as only one of them has extractValue, and duplicates the other key
columns across the extracted entries the same way we already do for
INCLUDE columns. The one thing to watch is that this can make the index
much larger, since every extracted entry then carries a copy of the other
columns. That is already true for INCLUDE columns today.

-- Sorted build

Andrey:

BTW sorting build ignores extract_value.

You're right, and it was latent, the sorted path is only chosen when every
key column has sortsupport, which the multirange opclasses don't, so it
isn't reached today. To stop it biting a future opclass that has both, in
v2 the main GiST patch (0001) refuses the sorted path when extractValue is
present rather than silently building a single-entry index.

Patches 0005 and 0006 are a possible implementation of the real support,
sent as a separate suggestion. 0005 lets tuplesort_putindextuplevalues set
the reserved bit so it survives the sort, and 0006 makes the sorted build
call extractValue. The bit has to ride through the sort because, unlike
btree posting lists, a heap tuple's entries are scattered by key and can't
be regrouped afterwards. No in-core opclass has both procs, so the tests
don't reach this, but I checked it by hand by giving multirange_me_ops a
range sortsupport and comparing scans against a sequential scan.

Worth carrying these two, or better left for later?

-- PostGIS

Andrey:

the compelling case is multi-part geometries (MultiPolygon with holes,
routes, regions with exclaves) [...] I am adding Darafei and Paul to CC

Agreed, I expect PostGIS geometries could be a good case for multi-entry.

Darafei, Paul: do you know specific cases where it would help? I'm
thinking of things like multi-polygons whose parts are far apart, or long
linestrings, where the single bounding box ends up very loose. But you
would know far better than me where that actually comes up in practice.

Regards,
Maxime

Alexander Nestorov

alexandernst@gmail.com

about 1 month ago

In reply to: Maxime Schoemans (#6)

Re: Multi-Entry Indexing for GiST & SP-GiST

Hello Maxime!

I'm currently working on an improvement on btree_gist and Andrey suggested that maybe I could ask you to check my proposal[0] /messages/by-id/36b4f67d-5975-452c-a6b8-b6407f0924ee@Spark since you're modifying related code. Would you be open to it? Any feedback would help me!

Regards

[0]: /messages/by-id/36b4f67d-5975-452c-a6b8-b6407f0924ee@Spark

Matthias van de Meent

boekewurm+postgres@gmail.com

about 1 month ago

In reply to: Maxime Schoemans (#6)

Re: Multi-Entry Indexing for GiST & SP-GiST

On Fri, 5 Jun 2026 at 00:31, Maxime Schoemans
<maxime.schoemans@enterprisedb.com> wrote:

Hi Andrey, hi Matthias,

Thank you both for the reviews. A lot of the points overlap, so I'm
answering both here and grouping by topic instead of by message. v2 is
attached, with the changes I describe below. The first four patches are
the proposal. The last two are a separate suggestion for the sorted build,
which I'd be happy to drop if you'd rather leave it for later.

-- Multi-entry in core vs. a specialized AM

Matthias:

I'm not sure this is preferable over a specialized AM (comparable to
what GIN is to btree), but cool!

The prototype I presented at PGConf.dev 2024 (MEST) was a separate forked
AM, so I did start there. The trouble is that multi-entry as it stands adds
no new on-disk structure. It reuses the existing GiST R-tree (and SP-GiST
quad-tree) unchanged and just feeds it decomposed entries, so a separate AM
would duplicate almost all of GiST/SP-GiST with little of its own.

I'm not trying to say that you need to fork the AM; that's not my point.
What I am trying to say is that by adding multi-entry, a lot of edge
cases need to be considered and added to the GIST AM definition
itself. A separate IndexAmRoutine which defines the multi-entry
variant would make more sense to me, even if they were to share most
of their internals and physical format definitions.

GIN can
justify its own AM because it adds posting lists on top of a btree.

It (critically) also contains its own internal expressions with a
planning/execution engine, which is much more complicated than much of
what the nbtree code needs to do.

Multi-entry could grow something comparable one day (merging near-duplicate
entries, with recheck), but that is a much larger change and nothing here
needs it.

I'd say it's important to keep disk usage in mind with this type of
change. Do all those users who want this feature want it to use this
much more disk?

extractValue as an optional support proc reuses all the existing
machinery, is opt-in per opclass, and costs existing opclasses nothing.

Moving it into core also helps adoption. Heavy GiST users like PostGIS are
more likely to add an extractValue function to their opclasses than to
depend on an out-of-tree AM.

Agreed, but that assumes this separate AM would be out-of-tree.

-- TID deduplication and scan memory

Matthias:

Could you help me understand how this deduplication works? Wouldn't
this possibly use ~ unbounded memory [...]? [...] we can't use lossy
checks for amgettuple() without losing correctness

Andrey:

[...] set INDEX_AM_RESERVED_BIT [...] only when extractValue returns
nentries > 1, and skip the hash on scan for entries without the bit

For bitmap scans there is no extra memory, the TID bitmap deduplicates on
its own. The hash is only used for amgettuple (plain and KNN ordered
scans), and you are right that it cannot go lossy without breaking
amgettuple, so the GIN trick does not carry over.

Two things keep it bounded. v2 adopts Andrey's reserved-bit idea, so the
hash only ever holds TIDs that extractValue actually multiplied (the bit is
set only when nentries > 1), and single-entry values skip it entirely. And
the planner only chooses amgettuple when it expects few matching rows.

Even in KNN plans when the ordering operator is very expensive? Note
that "few" is a relative term; an index scan will still be selected if
a full tuplesort becomes more expensive than random IOs caused by KNN
index scans.

For the worst case there are ways to bound it, like Andrey's suggestion of
spilling to a unique tuplestore once the hash goes over work_mem.

That'd be at the cost of very expensive IOs for ~ every tuple scanned,
so I'm not sure that's such a great idea.

v2 does
not do this yet. Matthias, are you thinking along those lines, or did you
have another idea in mind? And does it need handling before this goes in,
or is it fine as a follow-up?

I think it should be handled in some way or another by the time this
patch gets committed. Exceeding work_mem isn't exactly a great
solution, and putting TIDs in a hashtable (even a simplehash) wastes
memory.

A nit about memory: If you swap the order of GISTTIDHashEntry's hash
and status fields, the size of the struct drops to 12B from its
current 16B; it's probably worth benchmarking to see if the lack of
power-of-two alignment is worth the increased memory efficiency.

-- Index-only scans

Matthias:

[Index-only scans are disabled on a multi-entry key column.] How is
this done? Surely it's not through a planner check of GiST opclasses?

No planner check. It's amcanreturn (gistcanreturn), which returns false
for a key column that has an extractValue proc, since the stored
sub-entries don't represent the original datum. INCLUDE columns stay
returnable.

Hmm, yes. I assume that INCLUDE columns' values are duplicated across
entries, too? That sounds rather storage-expensive, but if the user is
willing to pay that cost without complaints, who am I to judge...

Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)

Multi-Entry Indexing for GiST & SP-GiST

Attachments:

Attachments: