[PATCH] Covering SPGiST index
Hi, hackers!
I'd like to propose a patch which introduces a functionality to include
additional columns to SPGiST index to increase speed of queries containing
them due to making the scans index only in this case. To date this
functionality was available in GiSt and btree, I suppose the same is useful
in SPGiST also.
A few words on realisaton:
1. The patch is intended to be fully compatible with previous SPGiSt
indexes so SpGist leaf tuple structure remains unchanged until the ending
of key attribute. All changes are introduced only after it. Internal tuples
remain unchanged at all.
2. Included data is added in the form very similar to heap tuple but unlike
the later it should not start from MAXALIGN boundary. I.e. nulls mask (if
exist) starts just after the key value (it doesn't need alignment). Each of
included attributes start from their own typealign boundary. The goal is to
make leaf tuples and therefore index more compact.
3. Leaf tuple header is modified to store additional per tuple flags:
a) is nullmask present - if there is at least one null value among included
attributes of a tuple
(Note that this nullmask apply only to include attributes as nulls
management for key attributes is already realised in SPGiSt by placing
leafs with null keys in separate list not in the main index tree.)
b) is there variable length values among included. If there is no and key
attribute is also fixed-length e.g. (kd-tree, quad-tree etc.) then leaf
tuple processing can be speed up using attcacheoff.
These bits are incorporated into unused higher bits of nextOffset in the
header SPGiST leaf tuple. Even if we have 64Kb pages and tuples of minimum
12 bytes (the length of the header on 32-bit architectures) + 4 bytes
ItemIdData 14 bit for nextOffset is more than enough.
All this changes only affect private index structures so all outside
behavior like WAL, vacuum etc will remain unchanged.
As usual I very much appreciate your feedback
--
Best regards,
Pavel Borisov
Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>
Attachments:
spgist-covering-0001.diffapplication/octet-stream; name=spgist-covering-0001.diffDownload
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 934d65b89f..b767a805fa 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -22,7 +22,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
-
+#include "access/htup_details.h"
/*
* SPPageDesc tracks all info about a page we are inserting into. In some
@@ -220,7 +220,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
SpGistBlockIsRoot(current->blkno))
{
/* Tuple is not part of a chain */
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET (leafTuple->nextOffset, InvalidOffsetNumber);
current->offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -253,7 +253,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
PageGetItemId(current->page, current->offnum));
if (head->tupstate == SPGIST_LIVE)
{
- leafTuple->nextOffset = head->nextOffset;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, SGLT_GET_OFFSET(head->nextOffset));
offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -264,14 +264,14 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
*/
head = (SpGistLeafTuple) PageGetItem(current->page,
PageGetItemId(current->page, current->offnum));
- head->nextOffset = offnum;
+ SGLT_SET_OFFSET(head->nextOffset, offnum);
xlrec.offnumLeaf = offnum;
xlrec.offnumHeadLeaf = current->offnum;
}
else if (head->tupstate == SPGIST_DEAD)
{
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset,InvalidOffsetNumber);
PageIndexTupleDelete(current->page, current->offnum);
if (PageAddItem(current->page,
(Item) leafTuple, leafTuple->size,
@@ -362,13 +362,13 @@ checkSplitConditions(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* Don't count it in result, because it won't go to other page */
}
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
*nToSplit = n;
@@ -437,7 +437,7 @@ moveLeafs(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* We don't want to move it, so don't count it in size */
toDelete[nDelete] = i;
nDelete++;
@@ -446,7 +446,7 @@ moveLeafs(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
/* Find a leaf page that will hold them */
@@ -475,7 +475,7 @@ moveLeafs(Relation index, SpGistState *state,
* don't care). We're modifying the tuple on the source page
* here, but it's okay since we're about to delete it.
*/
- it->nextOffset = r;
+ SGLT_SET_OFFSET(it->nextOffset,r);
r = SpGistPageAddNewItem(state, npage, (Item) it, it->size,
&startOffset, false);
@@ -490,7 +490,7 @@ moveLeafs(Relation index, SpGistState *state,
}
/* add the new tuple as well */
- newLeafTuple->nextOffset = r;
+ SGLT_SET_OFFSET(newLeafTuple->nextOffset, r);
r = SpGistPageAddNewItem(state, npage,
(Item) newLeafTuple, newLeafTuple->size,
&startOffset, false);
@@ -709,6 +709,9 @@ doPickSplit(Relation index, SpGistState *state,
int nToDelete,
nToInsert,
maxToInclude;
+ Datum *leafChainDatums;
+ bool *leafChainIsnulls;
+ const int natts = IndexRelationGetNumberOfAttributes(index);
in.level = level;
@@ -723,14 +726,16 @@ doPickSplit(Relation index, SpGistState *state,
toInsert = (OffsetNumber *) palloc(sizeof(OffsetNumber) * n);
newLeafs = (SpGistLeafTuple *) palloc(sizeof(SpGistLeafTuple) * n);
leafPageSelect = (uint8 *) palloc(sizeof(uint8) * n);
-
STORE_STATE(state, xlrec.stateSrc);
+ leafChainDatums = (Datum *) palloc(n*natts*sizeof(Datum));
+ leafChainIsnulls = (bool *) palloc(n*natts*sizeof(bool));
+
/*
- * Form list of leaf tuples which will be distributed as split result;
- * also, count up the amount of space that will be freed from current.
- * (Note that in the non-root case, we won't actually delete the old
- * tuples, only replace them with redirects or placeholders.)
+ * Collect leaf tuples which will be distributed as split result; also,
+ * count up the amount of space that will be freed from current. (Note
+ * that in the non-root case, we won't actually delete the old tuples,
+ * only replace them with redirects or placeholders.)
*
* Note: the SGLTDATUM calls here are safe even when dealing with a nulls
* page. For a pass-by-value data type we will fetch a word that must
@@ -738,7 +743,14 @@ doPickSplit(Relation index, SpGistState *state,
* tuples must have size at least SGDTSIZE). For a pass-by-reference type
* we are just computing a pointer that isn't going to get dereferenced.
* So it's not worth guarding the calls with isNulls checks.
+ *
+ * Datums and isnulls of all leaf tuple attributes in a chain are collected
+ * into 2-d arrays: (number of tuples in chain) x (number of attributes)
+ * First attribute is key, the other - included attributes (if any). After
+ * picksplit we need to form new leaf tuples as key attribute length can
+ * change which can affect alignment of every include attribute.
*/
+
nToInsert = 0;
nToDelete = 0;
spaceToDelete = 0;
@@ -759,6 +771,8 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+ SpGistDeformLeafTuple(it, state, leafChainDatums+nToInsert*natts,
+ leafChainIsnulls+nToInsert*natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -784,6 +798,9 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+
+ SpGistDeformLeafTuple(it, state, leafChainDatums+nToInsert*natts,
+ leafChainIsnulls+nToInsert*natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -795,7 +812,7 @@ doPickSplit(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
toDelete[nToDelete] = i;
nToDelete++;
/* replacing it with redirect will save no space */
@@ -803,7 +820,7 @@ doPickSplit(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
}
in.nTuples = nToInsert;
@@ -816,10 +833,17 @@ doPickSplit(Relation index, SpGistState *state,
*/
in.datums[in.nTuples] = SGLTDATUM(newLeafTuple, state);
heapPtrs[in.nTuples] = newLeafTuple->heapPtr;
+
+ SpGistDeformLeafTuple(newLeafTuple, state, leafChainDatums+(in.nTuples)*natts,
+ leafChainIsnulls+(in.nTuples)*natts, isNulls);
in.nTuples++;
memset(&out, 0, sizeof(out));
+ /*
+ * Process collected key values of tuples from the chain. Included values are used
+ * to build fresh leaf tuples unchanged.
+ */
if (!isNulls)
{
/*
@@ -837,9 +861,11 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- out.leafTupleDatums[i],
- false);
+ *(leafChainDatums+i*natts)=(Datum) out.leafTupleDatums[i];
+ *(leafChainIsnulls+i*natts)=false;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums+i*natts,
+ leafChainIsnulls+i*natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -860,9 +886,14 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- (Datum) 0,
- true);
+ /*
+ * Nulls tree can contain only null key values.
+ */
+ *(leafChainDatums+i*natts)=(Datum) 0;
+ *(leafChainIsnulls+i*natts)=true;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums+i*natts,
+ leafChainIsnulls+i*natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -1196,10 +1227,10 @@ doPickSplit(Relation index, SpGistState *state,
if (ItemPointerIsValid(&nodes[n]->t_tid))
{
Assert(ItemPointerGetBlockNumber(&nodes[n]->t_tid) == leafBlock);
- it->nextOffset = ItemPointerGetOffsetNumber(&nodes[n]->t_tid);
+ SGLT_SET_OFFSET(it->nextOffset, ItemPointerGetOffsetNumber(&nodes[n]->t_tid));
}
else
- it->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(it->nextOffset, InvalidOffsetNumber);
/* Insert it on page */
newoffset = SpGistPageAddNewItem(state, BufferGetPage(leafBuffer),
@@ -1889,67 +1920,81 @@ spgSplitNodeAction(Relation index, SpGistState *state,
*/
bool
spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull)
+ ItemPointer heapPtr, Datum *datum, bool *isnull)
{
int level = 0;
- Datum leafDatum;
+ Datum *leafDatum;
int leafSize;
SPPageDesc current,
parent;
FmgrInfo *procinfo = NULL;
+ int i;
/*
* Look up FmgrInfo of the user-defined choose function once, to save
* cycles in the loop below.
*/
- if (!isnull)
+ if (!isnull[0])
procinfo = index_getprocinfo(index, 1, SPGIST_CHOOSE_PROC);
/*
* Prepare the leaf datum to insert.
- *
- * If an optional "compress" method is provided, then call it to form the
- * leaf datum from the input datum. Otherwise store the input datum as
+ */
+
+ leafDatum = (Datum *) palloc0(sizeof(Datum) * (IndexRelationGetNumberOfAttributes(index)));
+
+ /* If an optional "compress" method is provided, then call it to form the
+ * key datum from the input datum. Otherwise store the input datum as
* is. Since we don't use index_form_tuple in this AM, we have to make
* sure value to be inserted is not toasted; FormIndexDatum doesn't
* guarantee that. But we assume the "compress" method to return an
* untoasted value.
*/
- if (!isnull)
+ if (!isnull[0])
{
if (OidIsValid(index_getprocid(index, 1, SPGIST_COMPRESS_PROC)))
{
FmgrInfo *compressProcinfo = NULL;
compressProcinfo = index_getprocinfo(index, 1, SPGIST_COMPRESS_PROC);
- leafDatum = FunctionCall1Coll(compressProcinfo,
+ leafDatum[0] = FunctionCall1Coll(compressProcinfo,
index->rd_indcollation[0],
- datum);
+ datum[0]);
}
else
{
Assert(state->attLeafType.type == state->attType.type);
if (state->attType.attlen == -1)
- leafDatum = PointerGetDatum(PG_DETOAST_DATUM(datum));
+ leafDatum[0] = PointerGetDatum(PG_DETOAST_DATUM(datum[0]));
else
- leafDatum = datum;
+ leafDatum[0] = datum[0];
}
}
else
- leafDatum = (Datum) 0;
+ leafDatum[0] = (Datum) 0;
+
+ for (i = 1; i < IndexRelationGetNumberOfAttributes(index); i++)
+ {
+ if (!isnull[i])
+ {
+ if (TupleDescAttr(state->includeTupdesc, i-1)->attlen == -1)
+ leafDatum[i] = PointerGetDatum(PG_DETOAST_DATUM(datum[i]));
+ else
+ leafDatum[i]=datum[i];
+ }
+ else
+ leafDatum[i]=(Datum) 0;
+ }
+
/*
- * Compute space needed for a leaf tuple containing the given datum.
+ * Compute space needed on a page for a leaf tuple containing the given datum.
*
* If it isn't gonna fit, and the opclass can't reduce the datum size by
* suffixing, bail out now rather than getting into an endless loop.
*/
- if (!isnull)
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
- else
- leafSize = SGDTSIZE + sizeof(ItemIdData);
+ leafSize = SpgLeafSize(state, leafDatum, isnull) + sizeof(ItemIdData);
if (leafSize > SPGIST_PAGE_CAPACITY && !state->config.longValuesOK)
ereport(ERROR,
@@ -1961,7 +2006,7 @@ spgdoinsert(Relation index, SpGistState *state,
errhint("Values larger than a buffer page cannot be indexed.")));
/* Initialize "current" to the appropriate root page */
- current.blkno = isnull ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
+ current.blkno = isnull[0] ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
current.buffer = InvalidBuffer;
current.page = NULL;
current.offnum = FirstOffsetNumber;
@@ -1995,7 +2040,7 @@ spgdoinsert(Relation index, SpGistState *state,
*/
current.buffer =
SpGistGetBuffer(index,
- GBUF_LEAF | (isnull ? GBUF_NULLS : 0),
+ GBUF_LEAF | (isnull[0] ? GBUF_NULLS : 0),
Min(leafSize, SPGIST_PAGE_CAPACITY),
&isNew);
current.blkno = BufferGetBlockNumber(current.buffer);
@@ -2037,7 +2082,7 @@ spgdoinsert(Relation index, SpGistState *state,
current.page = BufferGetPage(current.buffer);
/* should not arrive at a page of the wrong type */
- if (isnull ? !SpGistPageStoresNulls(current.page) :
+ if (isnull[0] ? !SpGistPageStoresNulls(current.page) :
SpGistPageStoresNulls(current.page))
elog(ERROR, "SPGiST index page %u has wrong nulls flag",
current.blkno);
@@ -2054,7 +2099,7 @@ spgdoinsert(Relation index, SpGistState *state,
{
/* it fits on page, so insert it and we're done */
addLeafTuple(index, state, leafTuple,
- ¤t, &parent, isnull, isNew);
+ ¤t, &parent, isnull[0], isNew);
break;
}
else if ((sizeToSplit =
@@ -2068,14 +2113,14 @@ spgdoinsert(Relation index, SpGistState *state,
* chain to another leaf page rather than splitting it.
*/
Assert(!isNew);
- moveLeafs(index, state, ¤t, &parent, leafTuple, isnull);
+ moveLeafs(index, state, ¤t, &parent, leafTuple, isnull[0]);
break; /* we're done */
}
else
{
/* picksplit */
if (doPickSplit(index, state, ¤t, &parent,
- leafTuple, level, isnull, isNew))
+ leafTuple, level, isnull[0], isNew))
break; /* doPickSplit installed new tuples */
/* leaf tuple will not be inserted yet */
@@ -2110,8 +2155,8 @@ spgdoinsert(Relation index, SpGistState *state,
innerTuple = (SpGistInnerTuple) PageGetItem(current.page,
PageGetItemId(current.page, current.offnum));
- in.datum = datum;
- in.leafDatum = leafDatum;
+ in.datum = datum[0];
+ in.leafDatum = leafDatum[0];
in.level = level;
in.allTheSame = innerTuple->allTheSame;
in.hasPrefix = (innerTuple->prefixSize > 0);
@@ -2121,7 +2166,7 @@ spgdoinsert(Relation index, SpGistState *state,
memset(&out, 0, sizeof(out));
- if (!isnull)
+ if (!isnull[0])
{
/* use user-defined choose method */
FunctionCall2Coll(procinfo,
@@ -2158,11 +2203,11 @@ spgdoinsert(Relation index, SpGistState *state,
/* Adjust level as per opclass request */
level += out.result.matchNode.levelAdd;
/* Replace leafDatum and recompute leafSize */
- if (!isnull)
+ if (!isnull[0])
{
- leafDatum = out.result.matchNode.restDatum;
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
+ leafDatum[0] = out.result.matchNode.restDatum;
+ leafSize = SpgLeafSize(state, leafDatum, isnull) +
+ sizeof(ItemIdData);
}
/*
@@ -2227,6 +2272,6 @@ spgdoinsert(Relation index, SpGistState *state,
SpGistSetLastUsedPage(index, parent.buffer);
UnlockReleaseBuffer(parent.buffer);
}
-
+ pfree(leafDatum);
return true;
}
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4508a2b92..b54ae85f6e 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -55,8 +55,7 @@ spgistBuildCallback(Relation index, ItemPointer tid, Datum *values,
* lock on some buffer. So we need to be willing to retry. We can flush
* any temp data when retrying.
*/
- while (!spgdoinsert(index, &buildstate->spgstate, tid,
- *values, *isnull))
+ while (!spgdoinsert(index, &buildstate->spgstate, tid, values, isnull))
{
MemoryContextReset(buildstate->tmpCtx);
}
@@ -226,7 +225,7 @@ spginsert(Relation index, Datum *values, bool *isnull,
* to avoid cumulative memory consumption. That means we also have to
* redo initSpGistState(), but it's cheap enough not to matter.
*/
- while (!spgdoinsert(index, &spgstate, ht_ctid, *values, *isnull))
+ while (!spgdoinsert(index, &spgstate, ht_ctid, values, isnull))
{
MemoryContextReset(insertCtx);
initSpGistState(&spgstate, index);
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 4d506bfb9a..b5dedc3afd 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -28,7 +28,8 @@
typedef void (*storeRes_func) (SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isNull, bool recheck,
- bool recheckDistances, double *distances);
+ bool recheckDistances, double *distances,
+ SpGistLeafTuple leafTuple);
/*
* Pairing heap comparison function for the SpGistSearchItem queue.
@@ -88,6 +89,9 @@ spgFreeSearchItem(SpGistScanOpaque so, SpGistSearchItem *item)
if (item->traversalValue)
pfree(item->traversalValue);
+ if (item->isLeaf && item->leafTuple)
+ pfree(item->leafTuple);
+
pfree(item);
}
@@ -134,6 +138,8 @@ spgAddStartItem(SpGistScanOpaque so, bool isnull)
startEntry->recheck = false;
startEntry->recheckDistances = false;
+ startEntry->leafTuple = NULL;
+
spgAddSearchItemToQueue(so, startEntry);
}
@@ -438,14 +444,29 @@ spgendscan(IndexScanDesc scan)
* Leaf SpGistSearchItem constructor, called in queue context
*/
static SpGistSearchItem *
-spgNewHeapItem(SpGistScanOpaque so, int level, ItemPointer heapPtr,
+spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
Datum leafValue, bool recheck, bool recheckDistances,
bool isnull, double *distances)
{
SpGistSearchItem *item = spgAllocSearchItem(so, isnull, distances);
+ /*
+ * If there are include attributes search item in the queue should
+ * contain them.
+ */
+ if (so->state.includeTupdesc)
+ {
+ Assert(so->state.includeTupdesc->natts);
+
+ item->leafTuple = palloc(leafTuple->size);
+ memcpy(item->leafTuple, leafTuple, leafTuple->size);
+ }
+ else
+ {
+ item->leafTuple = NULL;
+ }
item->level = level;
- item->heapPtr = *heapPtr;
+ item->heapPtr = leafTuple->heapPtr;
/* copy value to queue cxt out of tmp cxt */
item->value = isnull ? (Datum) 0 :
datumCopy(leafValue, so->state.attLeafType.attbyval,
@@ -503,6 +524,8 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
in.returnData = so->want_itup;
in.leafDatum = SGLTDATUM(leafTuple, &so->state);
+
+
out.leafValue = (Datum) 0;
out.recheck = false;
out.distances = NULL;
@@ -528,13 +551,12 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
/* the scan is ordered -> add the item to the queue */
MemoryContext oldCxt = MemoryContextSwitchTo(so->traversalCxt);
SpGistSearchItem *heapItem = spgNewHeapItem(so, item->level,
- &leafTuple->heapPtr,
+ leafTuple,
leafValue,
recheck,
recheckDistances,
isnull,
distances);
-
spgAddSearchItemToQueue(so, heapItem);
MemoryContextSwitchTo(oldCxt);
@@ -543,8 +565,10 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
{
/* non-ordered scan, so report the item right away */
Assert(!recheckDistances);
+
storeRes(so, &leafTuple->heapPtr, leafValue, isnull,
- recheck, false, NULL);
+ recheck, false, NULL, leafTuple);
+
*reportedSome = true;
}
}
@@ -736,7 +760,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
/* dead tuple should be first in chain */
Assert(offset == ItemPointerGetOffsetNumber(&item->heapPtr));
/* No live entries on this page */
- Assert(leafTuple->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(leafTuple->nextOffset) == InvalidOffsetNumber);
return SpGistBreakOffsetNumber;
}
}
@@ -750,7 +774,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes);
- return leafTuple->nextOffset;
+ return SGLT_GET_OFFSET(leafTuple->nextOffset);
}
/*
@@ -782,8 +806,8 @@ redirect:
{
/* We store heap items in the queue only in case of ordered search */
Assert(so->numberOfNonNullOrderBys > 0);
- storeRes(so, &item->heapPtr, item->value, item->isNull,
- item->recheck, item->recheckDistances, item->distances);
+ storeRes(so, &item->heapPtr, item->value, item->isNull, item->recheck,
+ item->recheckDistances, item->distances, item->leafTuple);
reportedSome = true;
}
else
@@ -877,7 +901,7 @@ redirect:
static void
storeBitmap(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *distances)
+ double *distances, SpGistLeafTuple leafTuple)
{
Assert(!recheckDistances && !distances);
tbm_add_tuples(so->tbm, heapPtr, 1, recheck);
@@ -904,7 +928,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
static void
storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *nonNullDistances)
+ double *nonNullDistances, SpGistLeafTuple leafTuple)
{
Assert(so->nPtrs < MaxIndexTuplesPerPage);
so->heapPtrs[so->nPtrs] = *heapPtr;
@@ -923,7 +947,7 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
for (i = 0; i < so->numberOfOrderBys; i++)
{
- int offset = so->nonNullOrderByOffsets[i];
+ int offset = so->nonNullOrderByOffsets[i];
if (offset >= 0)
{
@@ -949,9 +973,35 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
* Reconstruct index data. We have to copy the datum out of the temp
* context anyway, so we may as well create the tuple here.
*/
- so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ if (so->state.includeTupdesc)
+ {
+ /* Add included attributes */
+ Datum *leafDatums;
+ bool *leafIsnulls;
+
+ Assert(so->state.includeTupdesc->natts);
+
+ leafDatums = (Datum *) palloc(sizeof(Datum) * (so->state.includeTupdesc->natts + 1));
+ leafIsnulls = (bool *) palloc(sizeof(bool) * (so->state.includeTupdesc->natts + 1));
+
+ SpGistDeformLeafTuple(leafTuple, &so->state, leafDatums, leafIsnulls, isnull);
+
+ /* override key value extracted from LeafTuple in case we've reconstructed it already */
+ leafDatums[0]=leafValue;
+ leafIsnulls[0]=isnull;
+
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ leafDatums,
+ leafIsnulls);
+ pfree(leafDatums);
+ pfree(leafIsnulls);
+ }
+ else
+ {
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
&leafValue,
&isnull);
+ }
}
so->nPtrs++;
}
@@ -1018,6 +1068,9 @@ bool
spgcanreturn(Relation index, int attno)
{
SpGistCache *cache;
+
+ /* Included attributes always can be fetched for index-only scans */
+ if (attno > 1) return true;
/* We can do it if the opclass config function says so */
cache = spgGetCache(index);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 0efe05e552..9e8ebc9f87 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -31,8 +31,18 @@
#include "utils/index_selfuncs.h"
#include "utils/lsyscache.h"
#include "utils/syscache.h"
-
-
+#include "access/itup.h"
+#include "access/detoast.h"
+#include "access/toast_internals.h"
+#include "access/heaptoast.h"
+#include "utils/expandeddatum.h"
+
+/* Does att's datatype allow packing into the 1-byte-header varlena format? */
+#define ATT_IS_PACKABLE(att) \
+ ((att)->attlen == -1 && (att)->attstorage != TYPSTORAGE_PLAIN)
+
+Size spgIncludedDataSize(TupleDesc tupleDesc, Datum *values,
+ bool *isnull, Size start);
/*
* SP-GiST handler function: return IndexAmRoutine with access method parameters
* and callbacks.
@@ -49,7 +59,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amcanorderbyop = true;
amroutine->amcanbackward = false;
amroutine->amcanunique = false;
- amroutine->amcanmulticol = false;
+ amroutine->amcanmulticol = true;
amroutine->amoptionalkey = true;
amroutine->amsearcharray = false;
amroutine->amsearchnulls = true;
@@ -57,7 +67,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amclusterable = false;
amroutine->ampredlocks = false;
amroutine->amcanparallel = false;
- amroutine->amcaninclude = false;
+ amroutine->amcaninclude = true;
amroutine->amusemaintenanceworkmem = false;
amroutine->amparallelvacuumoptions =
VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
@@ -112,18 +122,17 @@ spgGetCache(Relation index)
FmgrInfo *procinfo;
Buffer metabuffer;
SpGistMetaPageData *metadata;
-
cache = MemoryContextAllocZero(index->rd_indexcxt,
sizeof(SpGistCache));
- /* SPGiST doesn't support multi-column indexes */
- Assert(index->rd_att->natts == 1);
-
+ /* SPGiST should have one key column and can also have included columns */
+ Assert(IndexRelationGetNumberOfKeyAttributes(index) == 1);
/*
- * Get the actual data type of the indexed column from the index
+ * Get the actual data type of the key column from the index
* tupdesc. We pass this to the opclass config function so that
* polymorphic opclasses are possible.
*/
+
atttype = TupleDescAttr(index->rd_att, 0)->atttypid;
/* Call the config function to get config info for the opclass */
@@ -156,6 +165,7 @@ spgGetCache(Relation index)
fillTypeDesc(&cache->attPrefixType, cache->config.prefixType);
fillTypeDesc(&cache->attLabelType, cache->config.labelType);
+
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
@@ -177,7 +187,22 @@ spgGetCache(Relation index)
/* assume it's up to date */
cache = (SpGistCache *) index->rd_amcache;
}
+ /* Form descriptor for included columns if any */
+ if (IndexRelationGetNumberOfAttributes(index) > 1)
+ {
+ int i;
+ cache->includeTupdesc = CreateTemplateTupleDesc(
+ IndexRelationGetNumberOfAttributes(index) - 1);
+ for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; i++)
+ {
+ TupleDescInitEntry(cache->includeTupdesc, i + 1, NULL,
+ TupleDescAttr(index->rd_att, i + 1)->atttypid,
+ -1, 0);
+ }
+ }
+ else
+ cache->includeTupdesc = NULL;
return cache;
}
@@ -190,6 +215,7 @@ initSpGistState(SpGistState *state, Relation index)
/* Get cached static information about index */
cache = spgGetCache(index);
+ state->includeTupdesc = cache->includeTupdesc;
state->config = cache->config;
state->attType = cache->attType;
state->attLeafType = cache->attLeafType;
@@ -603,7 +629,7 @@ spgoptions(Datum reloptions, bool validate)
/*
* Get the space needed to store a non-null datum of the indicated type.
- * Note the result is already rounded up to a MAXALIGN boundary.
+ * Note the result is not maxaligned and this should be done by caller if needed.
* Also, we follow the SPGiST convention that pass-by-val types are
* just stored in their Datum representation (compare memcpyDatum).
*/
@@ -619,7 +645,7 @@ SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum)
else
size = VARSIZE_ANY(datum);
- return MAXALIGN(size);
+ return size;
}
/*
@@ -642,36 +668,198 @@ memcpyDatum(void *target, SpGistTypeDesc *att, Datum datum)
}
/*
- * Construct a leaf tuple containing the given heap TID and datum value
+ * Private version of heap_compute_data_size with start address not
+ * necessarily MAXALIGNed. The reason is that start address (and alignment)
+ * influence alignment of each of next values and overall size of included
+ * data area in SpGiST leaf tuple.
+ */
+Size
+spgIncludedDataSize(TupleDesc tupleDesc,
+ Datum *values,
+ bool *isnull, Size start)
+{
+ Size data_length = 0;
+ int i;
+ int numberOfAttributes = tupleDesc->natts;
+
+ data_length = start;
+ for (i = 0; i < numberOfAttributes; i++)
+ {
+ Datum val;
+ Form_pg_attribute atti;
+
+ if (isnull[i])
+ continue;
+
+ val = values[i];
+ atti = TupleDescAttr(tupleDesc, i);
+
+ if (ATT_IS_PACKABLE(atti) &&
+ VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
+ {
+ /*
+ * we're anticipating converting to a short varlena header, so
+ * adjust length and don't count any alignment
+ */
+ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
+ }
+ else if (atti->attlen == -1 &&
+ VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+ {
+ /*
+ * we want to flatten the expanded value so that the constructed
+ * tuple doesn't depend on it
+ */
+ data_length = att_align_nominal(data_length, atti->attalign);
+ data_length += EOH_get_flat_size(DatumGetEOHP(val));
+ }
+ else
+ {
+ data_length = att_align_datum(data_length, atti->attalign,
+ atti->attlen, val);
+ data_length = att_addlength_datum(data_length, atti->attlen,
+ val);
+ }
+ }
+ return data_length-start;
+}
+
+/* Calculate overall leaf tuple size. SGLTHDRSZ is MAXALIGNed only for backward
+ * compatibility and there might be gap between header and key data. After key
+ * data there are no such gaps more than is is necessary for each value
+ * alignment. Overall result is MAXALIGNed.*/
+unsigned int SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull)
+{
+ /* compute space needed, nullmask size and offset for include attributes */
+ unsigned int size = SGLTHDRSZ;
+ unsigned int i;
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+ /* nullmask size */
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ size += (state->includeTupdesc->natts / 8) + 1;
+ break;
+ }
+ }
+ /* overall included attributes size each with added proper alignment. */
+ size += spgIncludedDataSize(state->includeTupdesc, datum+1, isnull+1, size);
+ }
+return MAXALIGN(size);
+}
+
+/*
+ * Construct a leaf tuple containing the given heap TID, key data and included
+ * columns data. Key data starts from MAXALIGN boundary for backward compatibility.
+ * Nullmask apply only to included attributes and is placed just after key data if
+ * there is at least one NULL among included attributes. It doesn't need alignment.
+ * Then all included columns data follow aligned by their typealign's.
*/
SpGistLeafTuple
spgFormLeafTuple(SpGistState *state, ItemPointer heapPtr,
- Datum datum, bool isnull)
+ Datum *datum, bool *isnull)
{
SpGistLeafTuple tup;
- unsigned int size;
+ unsigned int size=SGLTHDRSZ;
+ unsigned int include_offset=0;
+ unsigned int nullmask_size=0;
+ unsigned int data_offset=0;
+ unsigned int data_size=0;
+ uint16 tupmask=0;
+ int i;
- /* compute space needed (note result is already maxaligned) */
- size = SGLTHDRSZ;
- if (!isnull)
- size += SpGistGetTypeSize(&state->attLeafType, datum);
+ /*
+ * Calculate space needed. If there are include attributes also calculate sizes and
+ * offsets needed for heap_fill_tuple
+ */
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = size;
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ nullmask_size = (state->includeTupdesc->natts / 8) + 1;
+ size += nullmask_size;
+ break;
+ }
+ }
+
+ /*
+ * Alignment of all included attributes is counted inside data_size. data_offset
+ * itself is not aligned.
+ */
+ data_size = spgIncludedDataSize(state->includeTupdesc, datum+1, isnull+1, size);
+ data_offset=size;
+
+ size += data_size;
+ }
/*
* Ensure that we can replace the tuple with a dead tuple later. This
- * test is unnecessary when !isnull, but let's be safe.
+ * test is unnecessary when !isnull[0], but let's be safe.
*/
if (size < SGDTSIZE)
size = SGDTSIZE;
/* OK, form the tuple */
- tup = (SpGistLeafTuple) palloc0(size);
+ tup = (SpGistLeafTuple) palloc0(MAXALIGN(size));
- tup->size = size;
- tup->nextOffset = InvalidOffsetNumber;
+ tup->size = MAXALIGN(size);
+ SGLT_SET_OFFSET(tup->nextOffset, InvalidOffsetNumber);
tup->heapPtr = *heapPtr;
- if (!isnull)
- memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum);
+ if (!isnull[0])
+ memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum[0]);
+
+ /* Add included columns data to leaf tuple if any. */
+ if (state->includeTupdesc)
+ {
+ /* The start of include attributes tuple is not aligned by default. All values
+ * alignment should be done by heap_fill_tuple automaticaly. If there is a nulls
+ * mask it is included just after key attribute data and it should not be aligned.
+ */
+ heap_fill_tuple(state->includeTupdesc, datum+1, isnull+1,
+ (char *) tup + data_offset,
+ data_size, &tupmask,
+ (nullmask_size ? (bits8 *) tup + include_offset : NULL) );
+
+ if (nullmask_size)
+ SGLT_SET_CONTAINSNULLMASK(tup->nextOffset, 1);
+
+ /*
+ * We do this because heap_fill_tuple wants to initialize a "tupmask"
+ * which is used for HeapTuples, but the only relevant info is the
+ * "has variable attributes" field. We have already set the hasnull
+ * bit above.
+ */
+ if (tupmask & HEAP_HASVARWIDTH)
+ SGLT_SET_CONTAINSVARATT(tup->nextOffset, 1);
+ }
return tup;
}
@@ -688,10 +876,10 @@ spgFormNodeTuple(SpGistState *state, Datum label, bool isnull)
unsigned int size;
unsigned short infomask = 0;
- /* compute space needed (note result is already maxaligned) */
+ /* compute space needed*/
size = SGNTHDRSZ;
if (!isnull)
- size += SpGistGetTypeSize(&state->attLabelType, label);
+ size += MAXALIGN(SpGistGetTypeSize(&state->attLabelType, label));
/*
* Here we make sure that the size will fit in the field reserved for it
@@ -735,7 +923,7 @@ spgFormInnerTuple(SpGistState *state, bool hasPrefix, Datum prefix,
/* Compute size needed */
if (hasPrefix)
- prefixSize = SpGistGetTypeSize(&state->attPrefixType, prefix);
+ prefixSize = MAXALIGN(SpGistGetTypeSize(&state->attPrefixType, prefix));
else
prefixSize = 0;
@@ -1046,3 +1234,128 @@ spgproperty(Oid index_oid, int attno,
return true;
}
+
+/*
+ * Convert an SpGist tuple into palloc'd Datum/isnull arrays.
+ *
+ */
+void
+SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state, Datum *datum, bool *isnull,
+ bool key_isnull)
+{
+ unsigned int include_offset;/* offset of include data */
+ int off;
+ bits8 *nullmask_ptr = NULL; /* ptr to null bitmap in tuple */
+ char *tp;
+ bool slow = false; /* can we use/set attcacheoff? */
+ int i;
+
+ if (key_isnull)
+ {
+ datum[0] = (Datum) 0;
+ isnull[0] = true;
+ }
+ else
+ {
+ datum[0] = SGLTDATUM(tup, state);
+ isnull[0] = false;
+ }
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = key_isnull ? SGLTHDRSZ : SGLTHDRSZ + SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ tp = (char*) tup;
+ off = include_offset;
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ nullmask_ptr = (bits8 *) tp + include_offset;
+ off += (state->includeTupdesc->natts) / 8 + 1;
+ }
+
+ if (state->attLeafType.attlen > 0 && !SGLT_GET_CONTAINSVARATT(tup->nextOffset) &&
+ !SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ /* can use attcacheoff for all attributes */
+ {
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+ isnull[i] = false;
+ if (thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else
+ {
+ off = att_align_nominal(off, thisatt->attalign);
+ thisatt->attcacheoff = off;
+ }
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+ }
+ }
+ else
+ /* general case: can use cache until first null or varlen attribute */
+ {
+ if (state->attLeafType.attlen <= 0)
+ slow = true; /* can't use attcacheoff at all*/
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ if (att_isnull(i - 1, nullmask_ptr))
+ {
+ datum[i] = (Datum) 0;
+ isnull[i] = true;
+ slow = true; /* can't use attcacheoff anymore */
+ continue;
+ }
+ }
+
+ isnull[i] = false;
+
+ if (!slow && thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else if (thisatt->attlen == -1)
+ {
+ /*
+ * We can only cache the offset for a varlena attribute if the
+ * offset is already suitably aligned, so that there would be no
+ * pad bytes in any case: then the offset will be valid for either
+ * an aligned or unaligned value.
+ */
+ if (!slow && off == att_align_nominal(off, thisatt->attalign))
+ thisatt->attcacheoff = off;
+ else
+ {
+ off = att_align_pointer(off, thisatt->attalign, -1, tp + off);
+ slow = true;
+ }
+ }
+ else
+ {
+ /* not varlena, so safe to use att_align_nominal */
+ off = att_align_nominal(off, thisatt->attalign);
+
+ if (!slow)
+ thisatt->attcacheoff = off;
+ }
+
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+
+ if (thisatt->attlen <= 0)
+ slow = true; /* can't use attcacheoff anymore */
+ }
+ }
+ }
+}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index bd98707f3c..a9433f0ad4 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -168,23 +168,25 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
/* Form predecessor map, too */
- if (lt->nextOffset != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) != InvalidOffsetNumber)
{
/* paranoia about corrupted chain links */
- if (lt->nextOffset < FirstOffsetNumber ||
- lt->nextOffset > max ||
- predecessor[lt->nextOffset] != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) < FirstOffsetNumber ||
+ SGLT_GET_OFFSET(lt->nextOffset) > max ||
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] != InvalidOffsetNumber)
elog(ERROR, "inconsistent tuple chain links in page %u of index \"%s\"",
BufferGetBlockNumber(buffer),
RelationGetRelationName(index));
- predecessor[lt->nextOffset] = i;
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] = i;
}
}
else if (lt->tupstate == SPGIST_REDIRECT)
{
SpGistDeadTuple dt = (SpGistDeadTuple) lt;
- Assert(dt->nextOffset == InvalidOffsetNumber);
+ // Dead tuple nextOffset is allowed to have highest bit 0 or 1 in case it is
+ // inherited from SpGistLeafTuple where it has its own meaning.
+ Assert(SGLT_GET_OFFSET(dt->nextOffset) == InvalidOffsetNumber);
Assert(ItemPointerIsValid(&dt->pointer));
/*
@@ -201,7 +203,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
else
{
- Assert(lt->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(lt->nextOffset) == InvalidOffsetNumber);
}
}
@@ -250,7 +252,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
prevLive = deletable[i] ? InvalidOffsetNumber : i;
/* scan down the chain ... */
- j = head->nextOffset;
+ j = SGLT_GET_OFFSET(head->nextOffset);
while (j != InvalidOffsetNumber)
{
SpGistLeafTuple lt;
@@ -301,7 +303,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
interveningDeletable = false;
}
- j = lt->nextOffset;
+ j = SGLT_GET_OFFSET(lt->nextOffset);
}
if (prevLive == InvalidOffsetNumber)
@@ -366,7 +368,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 7be2291d07..4022e3af07 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -122,8 +122,8 @@ spgRedoAddLeaf(XLogReaderState *record)
head = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, xldata->offnumHeadLeaf));
- Assert(head->nextOffset == leafTupleHdr.nextOffset);
- head->nextOffset = xldata->offnumLeaf;
+ Assert(SGLT_GET_OFFSET(head->nextOffset) == SGLT_GET_OFFSET(leafTupleHdr.nextOffset));
+ SGLT_SET_OFFSET(head->nextOffset, xldata->offnumLeaf);
}
}
else
@@ -822,7 +822,7 @@ spgRedoVacuumLeaf(XLogReaderState *record)
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
PageSetLSN(page, lsn);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index e976201030..514d5e21e4 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2,8 +2,6 @@
*
* elog.c
* error logging and reporting
- *
- * Because of the extremely high rate at which log messages can be generated,
* we need to be mindful of the performance cost of obtaining any information
* that may be logged. Also, it's important to keep in mind that this code may
* get called from within an aborted transaction, in which case operations
@@ -244,6 +242,7 @@ errstart(int elevel, const char *domain)
*/
if (elevel >= ERROR)
{
+ // abort();
/*
* If we are inside a critical section, all errors become PANIC
* errors. See miscadmin.h.
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index 00b98ec6a0..c16ee8c322 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -141,6 +141,7 @@ typedef struct SpGistState
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc; /* tuple descriptor of included columns */
char *deadTupleStorage; /* workspace for spgFormDeadTuple */
@@ -148,6 +149,91 @@ typedef struct SpGistState
bool isBuild; /* true if doing index build */
} SpGistState;
+/*
+ * SPGiST leaf tuple: carries a datum and a heap tuple TID
+ *
+ * In the simplest case, the datum is the same as the indexed value; but
+ * it could also be a suffix or some other sort of delta that permits
+ * reconstruction given knowledge of the prefix path traversed to get here.
+ *
+ * The size field is wider than could possibly be needed for an on-disk leaf
+ * tuple, but this allows us to form leaf tuples even when the datum is too
+ * wide to be stored immediately, and it costs nothing because of alignment
+ * considerations.
+ *
+ * Normally, nextOffset links to the next tuple belonging to the same parent
+ * node (which must be on the same page). But when the root page is a leaf
+ * page, we don't chain its tuples, so nextOffset is always 0 on the root.
+ *
+ * size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
+ * so that the tuple can be converted to REDIRECT status later. (This
+ * restriction only adds bytes for the null-datum case, otherwise alignment
+ * restrictions force it anyway.)
+ *
+ * In a leaf tuple for a NULL indexed value, there's no useful datum value;
+ * however, the SGDTSIZE limit ensures that's there's a Datum word there
+ * anyway, so SGLTDATUM can be applied safely as long as you don't do
+ * anything with the result.
+ *
+ * As SpGistLeafTuple has header of 8 bytes so max value for nextOffset is
+ * (when page size is 65KB) is 8192 and 15 bit is sufficient to store it. So
+ * higher bit is reserved to store information is there nulls mask between leaf
+ * datum and first include value (if any). Size of null mask is 1 byte per each 8
+ * include columns.
+ */
+
+typedef struct SpGistLeafTupleData
+{
+ unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
+ size:30; /* large enough for any palloc'able value */
+ OffsetNumber nextOffset; /* higher 1 bit = 1 if included values has nulls
+ 2 bit = 1 if included values contain variable length values
+ lower 15 bits - nextOffset - points to the next tuple in chain,
+ or InvalidOffsetNumber. They SHOULD NOT be set/read directly,
+ SGLT_SET_OFFSET/SGLT_GET_OFFSET macro must be used instead. */
+ ItemPointerData heapPtr; /* TID of represented heap tuple */
+ /* leaf datum follows */
+ /* if SGLT_GET_CONTAINSNULLMASK nullmask follows. Its size (number of included columns/8)+1 */
+ /* include attributes follow if any*/
+} SpGistLeafTupleData;
+
+typedef SpGistLeafTupleData *SpGistLeafTuple;
+
+#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
+#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
+#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
+ *(Datum *) SGLTDATAPTR(x) : \
+ PointerGetDatum(SGLTDATAPTR(x)))
+/*
+ * Accessor macros for nextOffset and null mask presence bit.
+ * It's a bit of hack that these macros also safely apply to IncludeTupMetadata which has the same
+ * structure. Include tuple size of maximum 13 bits (see INDEX_SIZE_MASK) is stored there instead
+ * of NextOffset which is 14 bits. IncludeTupMetadata is a vehicle to transfer included tuple header
+ * as IncludeTuple is now filled before SpGistLeafTuple initialized.
+ */
+#define SGLT_GET_OFFSET(x) ( (x) & 0x3FFF )
+#define SGLT_GET_CONTAINSNULLMASK(x) ( (x) >> 15 )
+#define SGLT_GET_CONTAINSVARATT(x) ( ( (x) & 4000 ) >> 14 )
+#define SGLT_SET_OFFSET(x,o) ( (x) = ( (x) & 0xC000 ) | ( (o) & 0x3FFF) )
+#define SGLT_SET_CONTAINSNULLMASK(x,n) ( (x) = ( (n) << 15 ) | ( (x) & 0x3FFF ) )
+#define SGLT_SET_CONTAINSVARATT(x,v) ( (x) = ( (v) << 14 ) | ( (x) & 0xBFFF ) )
+
+#define SGLT_GET_INCLUDE_TUPSIZE(x) SGLT_GET_OFFSET(x)
+#define SGLT_SET_INCLUDE_TUPSIZE(x,o) SGLT_SET_OFFSET(x,o)
+
+extern char *SpGistFormIncludeTuple(TupleDesc tupleDescriptor, Datum *values,
+ bool *isnull, uint16 *tupdata);
+/*
+ * SPGiST dead tuple: declaration for examining non-live tuples
+ *
+ * The tupstate field of this struct must match those of regular inner and
+ * leaf tuples, and its size field must match a leaf tuple's.
+ * Also, the pointer field must be in the same place as a leaf tuple's heapPtr
+ * field, to satisfy some Asserts that we make when replacing a leaf tuple
+ * with a dead tuple.
+ * We don't use nextOffset, but it's needed to align the pointer field.
+ */
+
typedef struct SpGistSearchItem
{
pairingheap_node phNode; /* pairing heap node */
@@ -160,14 +246,14 @@ typedef struct SpGistSearchItem
bool isLeaf; /* SearchItem is heap item */
bool recheck; /* qual recheck is needed */
bool recheckDistances; /* distance recheck is needed */
-
+ SpGistLeafTuple leafTuple;
/* array with numberOfOrderBys entries */
double distances[FLEXIBLE_ARRAY_MEMBER];
+ /* if there are include columns SpGistLeafTupleData follow */
} SpGistSearchItem;
#define SizeOfSpGistSearchItem(n_distances) \
(offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
-
/*
* Private state of an index scan
*/
@@ -241,6 +327,7 @@ typedef struct SpGistCache
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc;
SpGistLUPCache lastUsedPages; /* local storage of last-used info */
} SpGistCache;
@@ -321,60 +408,6 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
*(Datum *) SGNTDATAPTR(x) : \
PointerGetDatum(SGNTDATAPTR(x)))
-/*
- * SPGiST leaf tuple: carries a datum and a heap tuple TID
- *
- * In the simplest case, the datum is the same as the indexed value; but
- * it could also be a suffix or some other sort of delta that permits
- * reconstruction given knowledge of the prefix path traversed to get here.
- *
- * The size field is wider than could possibly be needed for an on-disk leaf
- * tuple, but this allows us to form leaf tuples even when the datum is too
- * wide to be stored immediately, and it costs nothing because of alignment
- * considerations.
- *
- * Normally, nextOffset links to the next tuple belonging to the same parent
- * node (which must be on the same page). But when the root page is a leaf
- * page, we don't chain its tuples, so nextOffset is always 0 on the root.
- *
- * size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
- * so that the tuple can be converted to REDIRECT status later. (This
- * restriction only adds bytes for the null-datum case, otherwise alignment
- * restrictions force it anyway.)
- *
- * In a leaf tuple for a NULL indexed value, there's no useful datum value;
- * however, the SGDTSIZE limit ensures that's there's a Datum word there
- * anyway, so SGLTDATUM can be applied safely as long as you don't do
- * anything with the result.
- */
-typedef struct SpGistLeafTupleData
-{
- unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
- size:30; /* large enough for any palloc'able value */
- OffsetNumber nextOffset; /* next tuple in chain, or InvalidOffsetNumber */
- ItemPointerData heapPtr; /* TID of represented heap tuple */
- /* leaf datum follows */
-} SpGistLeafTupleData;
-
-typedef SpGistLeafTupleData *SpGistLeafTuple;
-
-#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
-#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
-#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
- *(Datum *) SGLTDATAPTR(x) : \
- PointerGetDatum(SGLTDATAPTR(x)))
-
-/*
- * SPGiST dead tuple: declaration for examining non-live tuples
- *
- * The tupstate field of this struct must match those of regular inner and
- * leaf tuples, and its size field must match a leaf tuple's.
- * Also, the pointer field must be in the same place as a leaf tuple's heapPtr
- * field, to satisfy some Asserts that we make when replacing a leaf tuple
- * with a dead tuple.
- * We don't use nextOffset, but it's needed to align the pointer field.
- * pointer and xid are only valid when tupstate = REDIRECT.
- */
typedef struct SpGistDeadTupleData
{
unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
@@ -394,7 +427,6 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
* size plus sizeof(ItemIdData) (for the line pointer). This works correctly
* so long as tuple sizes are always maxaligned.
*/
-
/* Page capacity after allowing for fixed header and special space */
#define SPGIST_PAGE_CAPACITY \
MAXALIGN_DOWN(BLCKSZ - \
@@ -456,9 +488,10 @@ extern void SpGistInitPage(Page page, uint16 f);
extern void SpGistInitBuffer(Buffer b, uint16 f);
extern void SpGistInitMetapage(Page page);
extern unsigned int SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum);
+extern unsigned int SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull);
extern SpGistLeafTuple spgFormLeafTuple(SpGistState *state,
ItemPointer heapPtr,
- Datum datum, bool isnull);
+ Datum *datum, bool *isnull);
extern SpGistNodeTuple spgFormNodeTuple(SpGistState *state,
Datum label, bool isnull);
extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
@@ -466,6 +499,8 @@ extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
int nNodes, SpGistNodeTuple *nodes);
extern SpGistDeadTuple spgFormDeadTuple(SpGistState *state, int tupstate,
BlockNumber blkno, OffsetNumber offnum);
+extern void SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state,
+ Datum *datum, bool *isnull, bool key_value_isnull);
extern Datum *spgExtractNodeLabels(SpGistState *state,
SpGistInnerTuple innerTuple);
extern OffsetNumber SpGistPageAddNewItem(SpGistState *state, Page page,
@@ -484,7 +519,7 @@ extern void spgPageIndexMultiDelete(SpGistState *state, Page page,
int firststate, int reststate,
BlockNumber blkno, OffsetNumber offnum);
extern bool spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull);
+ ItemPointer heapPtr, Datum *datum, bool *isnull);
/* spgproc.c */
extern double *spg_key_orderbys_distances(Datum key, bool isLeaf,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index d92a6d12c6..93e6a43b6d 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -169,9 +169,9 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
hash | bogus |
spgist | can_order | f
spgist | can_unique | f
- spgist | can_multi_col | f
+ spgist | can_multi_col | t
spgist | can_exclude | t
- spgist | can_include | f
+ spgist | can_include | t
spgist | bogus |
(36 rows)
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..4fd2b7e878 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -356,7 +356,6 @@ CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "brin" does not support included columns
CREATE INDEX on tbl USING gist(c3) INCLUDE (c1, c4);
CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
-ERROR: access method "spgist" does not support included columns
CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "gin" does not support included columns
CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
diff --git a/src/test/regress/expected/index_including_spgist.out b/src/test/regress/expected/index_including_spgist.out
new file mode 100644
index 0000000000..fa64766fb7
--- /dev/null
+++ b/src/test/regress/expected/index_including_spgist.out
@@ -0,0 +1,139 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+DROP TABLE tbl_spgist;
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+----------
+(0 rows)
+
+DROP TABLE tbl_spgist;
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+ Table "public.tbl_spgist"
+ Column | Type | Collation | Nullable | Default
+--------+---------+-----------+----------+---------
+ c1 | bigint | | |
+ c2 | integer | | |
+ c3 | bigint | | |
+ c4 | box | | |
+Indexes:
+ "tbl_spgist_idx" spgist (c4) INCLUDE (c1, c3)
+
+DROP TABLE tbl_spgist;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..985458a1a8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -50,7 +50,7 @@ test: copy copyselect copydml insert insert_conflict
# ----------
test: create_misc create_operator create_procedure
# These depend on create_misc and create_operator
-test: create_index create_index_spgist create_view index_including index_including_gist
+test: create_index create_index_spgist create_view index_including index_including_gist index_including_spgist
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..f3df961535 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -68,6 +68,7 @@ test: create_index_spgist
test: create_view
test: index_including
test: index_including_gist
+test: index_including_spgist
test: create_aggregate
test: create_function_3
test: create_cast
diff --git a/src/test/regress/sql/index_including_spgist.sql b/src/test/regress/sql/index_including_spgist.sql
new file mode 100644
index 0000000000..a59e73aa22
--- /dev/null
+++ b/src/test/regress/sql/index_including_spgist.sql
@@ -0,0 +1,81 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+DROP TABLE tbl_spgist;
+
7 авг. 2020 г., в 16:59, Pavel Borisov <pashkin.elfe@gmail.com> написал(а):
As usual I very much appreciate your feedback
Thanks for the patch! Looks interesting.
On a first glance the whole concept of non-multicolumn index with included attributes seems...well, just difficult to understand.
But I expect for SP-GiST this must be single key with multiple included attributes, right?
I couldn't find a test that checks impossibility of on 2-column SP-GiST, only few asserts about it. Is this checked somewhere else?
Thanks!
Best regards, Andrey Borodin.
On a first glance the whole concept of non-multicolumn index with included
attributes seems...well, just difficult to understand.
But I expect for SP-GiST this must be single key with multiple included
attributes, right?
I couldn't find a test that checks impossibility of on 2-column SP-GiST,
only few asserts about it. Is this checked somewhere else?
Yes, SpGist is by its construction a single-column index, there is no such
thing like 2-column SP-GiST yet. In the same way like original SpGist will
refuse to add a second key column, this remains after modification as well,
with exception of columns attached by INCLUDE directive. They can be
(INDEX_MAX_KEYS -1) pieces and they will not be used to create additional
index trees (as there is only one), they will be just attached to the key
tree leafs tuple.
I also little bit corrected error reporting for the case when user wants to
invoke index build with not one column. Thanks!
--
Best regards,
Pavel Borisov
Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>
Attachments:
spgist-covering-0002.diffapplication/octet-stream; name=spgist-covering-0002.diffDownload
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 934d65b89f..b767a805fa 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -22,7 +22,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
-
+#include "access/htup_details.h"
/*
* SPPageDesc tracks all info about a page we are inserting into. In some
@@ -220,7 +220,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
SpGistBlockIsRoot(current->blkno))
{
/* Tuple is not part of a chain */
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET (leafTuple->nextOffset, InvalidOffsetNumber);
current->offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -253,7 +253,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
PageGetItemId(current->page, current->offnum));
if (head->tupstate == SPGIST_LIVE)
{
- leafTuple->nextOffset = head->nextOffset;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, SGLT_GET_OFFSET(head->nextOffset));
offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -264,14 +264,14 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
*/
head = (SpGistLeafTuple) PageGetItem(current->page,
PageGetItemId(current->page, current->offnum));
- head->nextOffset = offnum;
+ SGLT_SET_OFFSET(head->nextOffset, offnum);
xlrec.offnumLeaf = offnum;
xlrec.offnumHeadLeaf = current->offnum;
}
else if (head->tupstate == SPGIST_DEAD)
{
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset,InvalidOffsetNumber);
PageIndexTupleDelete(current->page, current->offnum);
if (PageAddItem(current->page,
(Item) leafTuple, leafTuple->size,
@@ -362,13 +362,13 @@ checkSplitConditions(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* Don't count it in result, because it won't go to other page */
}
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
*nToSplit = n;
@@ -437,7 +437,7 @@ moveLeafs(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* We don't want to move it, so don't count it in size */
toDelete[nDelete] = i;
nDelete++;
@@ -446,7 +446,7 @@ moveLeafs(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
/* Find a leaf page that will hold them */
@@ -475,7 +475,7 @@ moveLeafs(Relation index, SpGistState *state,
* don't care). We're modifying the tuple on the source page
* here, but it's okay since we're about to delete it.
*/
- it->nextOffset = r;
+ SGLT_SET_OFFSET(it->nextOffset,r);
r = SpGistPageAddNewItem(state, npage, (Item) it, it->size,
&startOffset, false);
@@ -490,7 +490,7 @@ moveLeafs(Relation index, SpGistState *state,
}
/* add the new tuple as well */
- newLeafTuple->nextOffset = r;
+ SGLT_SET_OFFSET(newLeafTuple->nextOffset, r);
r = SpGistPageAddNewItem(state, npage,
(Item) newLeafTuple, newLeafTuple->size,
&startOffset, false);
@@ -709,6 +709,9 @@ doPickSplit(Relation index, SpGistState *state,
int nToDelete,
nToInsert,
maxToInclude;
+ Datum *leafChainDatums;
+ bool *leafChainIsnulls;
+ const int natts = IndexRelationGetNumberOfAttributes(index);
in.level = level;
@@ -723,14 +726,16 @@ doPickSplit(Relation index, SpGistState *state,
toInsert = (OffsetNumber *) palloc(sizeof(OffsetNumber) * n);
newLeafs = (SpGistLeafTuple *) palloc(sizeof(SpGistLeafTuple) * n);
leafPageSelect = (uint8 *) palloc(sizeof(uint8) * n);
-
STORE_STATE(state, xlrec.stateSrc);
+ leafChainDatums = (Datum *) palloc(n*natts*sizeof(Datum));
+ leafChainIsnulls = (bool *) palloc(n*natts*sizeof(bool));
+
/*
- * Form list of leaf tuples which will be distributed as split result;
- * also, count up the amount of space that will be freed from current.
- * (Note that in the non-root case, we won't actually delete the old
- * tuples, only replace them with redirects or placeholders.)
+ * Collect leaf tuples which will be distributed as split result; also,
+ * count up the amount of space that will be freed from current. (Note
+ * that in the non-root case, we won't actually delete the old tuples,
+ * only replace them with redirects or placeholders.)
*
* Note: the SGLTDATUM calls here are safe even when dealing with a nulls
* page. For a pass-by-value data type we will fetch a word that must
@@ -738,7 +743,14 @@ doPickSplit(Relation index, SpGistState *state,
* tuples must have size at least SGDTSIZE). For a pass-by-reference type
* we are just computing a pointer that isn't going to get dereferenced.
* So it's not worth guarding the calls with isNulls checks.
+ *
+ * Datums and isnulls of all leaf tuple attributes in a chain are collected
+ * into 2-d arrays: (number of tuples in chain) x (number of attributes)
+ * First attribute is key, the other - included attributes (if any). After
+ * picksplit we need to form new leaf tuples as key attribute length can
+ * change which can affect alignment of every include attribute.
*/
+
nToInsert = 0;
nToDelete = 0;
spaceToDelete = 0;
@@ -759,6 +771,8 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+ SpGistDeformLeafTuple(it, state, leafChainDatums+nToInsert*natts,
+ leafChainIsnulls+nToInsert*natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -784,6 +798,9 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+
+ SpGistDeformLeafTuple(it, state, leafChainDatums+nToInsert*natts,
+ leafChainIsnulls+nToInsert*natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -795,7 +812,7 @@ doPickSplit(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
toDelete[nToDelete] = i;
nToDelete++;
/* replacing it with redirect will save no space */
@@ -803,7 +820,7 @@ doPickSplit(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
}
in.nTuples = nToInsert;
@@ -816,10 +833,17 @@ doPickSplit(Relation index, SpGistState *state,
*/
in.datums[in.nTuples] = SGLTDATUM(newLeafTuple, state);
heapPtrs[in.nTuples] = newLeafTuple->heapPtr;
+
+ SpGistDeformLeafTuple(newLeafTuple, state, leafChainDatums+(in.nTuples)*natts,
+ leafChainIsnulls+(in.nTuples)*natts, isNulls);
in.nTuples++;
memset(&out, 0, sizeof(out));
+ /*
+ * Process collected key values of tuples from the chain. Included values are used
+ * to build fresh leaf tuples unchanged.
+ */
if (!isNulls)
{
/*
@@ -837,9 +861,11 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- out.leafTupleDatums[i],
- false);
+ *(leafChainDatums+i*natts)=(Datum) out.leafTupleDatums[i];
+ *(leafChainIsnulls+i*natts)=false;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums+i*natts,
+ leafChainIsnulls+i*natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -860,9 +886,14 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- (Datum) 0,
- true);
+ /*
+ * Nulls tree can contain only null key values.
+ */
+ *(leafChainDatums+i*natts)=(Datum) 0;
+ *(leafChainIsnulls+i*natts)=true;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums+i*natts,
+ leafChainIsnulls+i*natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -1196,10 +1227,10 @@ doPickSplit(Relation index, SpGistState *state,
if (ItemPointerIsValid(&nodes[n]->t_tid))
{
Assert(ItemPointerGetBlockNumber(&nodes[n]->t_tid) == leafBlock);
- it->nextOffset = ItemPointerGetOffsetNumber(&nodes[n]->t_tid);
+ SGLT_SET_OFFSET(it->nextOffset, ItemPointerGetOffsetNumber(&nodes[n]->t_tid));
}
else
- it->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(it->nextOffset, InvalidOffsetNumber);
/* Insert it on page */
newoffset = SpGistPageAddNewItem(state, BufferGetPage(leafBuffer),
@@ -1889,67 +1920,81 @@ spgSplitNodeAction(Relation index, SpGistState *state,
*/
bool
spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull)
+ ItemPointer heapPtr, Datum *datum, bool *isnull)
{
int level = 0;
- Datum leafDatum;
+ Datum *leafDatum;
int leafSize;
SPPageDesc current,
parent;
FmgrInfo *procinfo = NULL;
+ int i;
/*
* Look up FmgrInfo of the user-defined choose function once, to save
* cycles in the loop below.
*/
- if (!isnull)
+ if (!isnull[0])
procinfo = index_getprocinfo(index, 1, SPGIST_CHOOSE_PROC);
/*
* Prepare the leaf datum to insert.
- *
- * If an optional "compress" method is provided, then call it to form the
- * leaf datum from the input datum. Otherwise store the input datum as
+ */
+
+ leafDatum = (Datum *) palloc0(sizeof(Datum) * (IndexRelationGetNumberOfAttributes(index)));
+
+ /* If an optional "compress" method is provided, then call it to form the
+ * key datum from the input datum. Otherwise store the input datum as
* is. Since we don't use index_form_tuple in this AM, we have to make
* sure value to be inserted is not toasted; FormIndexDatum doesn't
* guarantee that. But we assume the "compress" method to return an
* untoasted value.
*/
- if (!isnull)
+ if (!isnull[0])
{
if (OidIsValid(index_getprocid(index, 1, SPGIST_COMPRESS_PROC)))
{
FmgrInfo *compressProcinfo = NULL;
compressProcinfo = index_getprocinfo(index, 1, SPGIST_COMPRESS_PROC);
- leafDatum = FunctionCall1Coll(compressProcinfo,
+ leafDatum[0] = FunctionCall1Coll(compressProcinfo,
index->rd_indcollation[0],
- datum);
+ datum[0]);
}
else
{
Assert(state->attLeafType.type == state->attType.type);
if (state->attType.attlen == -1)
- leafDatum = PointerGetDatum(PG_DETOAST_DATUM(datum));
+ leafDatum[0] = PointerGetDatum(PG_DETOAST_DATUM(datum[0]));
else
- leafDatum = datum;
+ leafDatum[0] = datum[0];
}
}
else
- leafDatum = (Datum) 0;
+ leafDatum[0] = (Datum) 0;
+
+ for (i = 1; i < IndexRelationGetNumberOfAttributes(index); i++)
+ {
+ if (!isnull[i])
+ {
+ if (TupleDescAttr(state->includeTupdesc, i-1)->attlen == -1)
+ leafDatum[i] = PointerGetDatum(PG_DETOAST_DATUM(datum[i]));
+ else
+ leafDatum[i]=datum[i];
+ }
+ else
+ leafDatum[i]=(Datum) 0;
+ }
+
/*
- * Compute space needed for a leaf tuple containing the given datum.
+ * Compute space needed on a page for a leaf tuple containing the given datum.
*
* If it isn't gonna fit, and the opclass can't reduce the datum size by
* suffixing, bail out now rather than getting into an endless loop.
*/
- if (!isnull)
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
- else
- leafSize = SGDTSIZE + sizeof(ItemIdData);
+ leafSize = SpgLeafSize(state, leafDatum, isnull) + sizeof(ItemIdData);
if (leafSize > SPGIST_PAGE_CAPACITY && !state->config.longValuesOK)
ereport(ERROR,
@@ -1961,7 +2006,7 @@ spgdoinsert(Relation index, SpGistState *state,
errhint("Values larger than a buffer page cannot be indexed.")));
/* Initialize "current" to the appropriate root page */
- current.blkno = isnull ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
+ current.blkno = isnull[0] ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
current.buffer = InvalidBuffer;
current.page = NULL;
current.offnum = FirstOffsetNumber;
@@ -1995,7 +2040,7 @@ spgdoinsert(Relation index, SpGistState *state,
*/
current.buffer =
SpGistGetBuffer(index,
- GBUF_LEAF | (isnull ? GBUF_NULLS : 0),
+ GBUF_LEAF | (isnull[0] ? GBUF_NULLS : 0),
Min(leafSize, SPGIST_PAGE_CAPACITY),
&isNew);
current.blkno = BufferGetBlockNumber(current.buffer);
@@ -2037,7 +2082,7 @@ spgdoinsert(Relation index, SpGistState *state,
current.page = BufferGetPage(current.buffer);
/* should not arrive at a page of the wrong type */
- if (isnull ? !SpGistPageStoresNulls(current.page) :
+ if (isnull[0] ? !SpGistPageStoresNulls(current.page) :
SpGistPageStoresNulls(current.page))
elog(ERROR, "SPGiST index page %u has wrong nulls flag",
current.blkno);
@@ -2054,7 +2099,7 @@ spgdoinsert(Relation index, SpGistState *state,
{
/* it fits on page, so insert it and we're done */
addLeafTuple(index, state, leafTuple,
- ¤t, &parent, isnull, isNew);
+ ¤t, &parent, isnull[0], isNew);
break;
}
else if ((sizeToSplit =
@@ -2068,14 +2113,14 @@ spgdoinsert(Relation index, SpGistState *state,
* chain to another leaf page rather than splitting it.
*/
Assert(!isNew);
- moveLeafs(index, state, ¤t, &parent, leafTuple, isnull);
+ moveLeafs(index, state, ¤t, &parent, leafTuple, isnull[0]);
break; /* we're done */
}
else
{
/* picksplit */
if (doPickSplit(index, state, ¤t, &parent,
- leafTuple, level, isnull, isNew))
+ leafTuple, level, isnull[0], isNew))
break; /* doPickSplit installed new tuples */
/* leaf tuple will not be inserted yet */
@@ -2110,8 +2155,8 @@ spgdoinsert(Relation index, SpGistState *state,
innerTuple = (SpGistInnerTuple) PageGetItem(current.page,
PageGetItemId(current.page, current.offnum));
- in.datum = datum;
- in.leafDatum = leafDatum;
+ in.datum = datum[0];
+ in.leafDatum = leafDatum[0];
in.level = level;
in.allTheSame = innerTuple->allTheSame;
in.hasPrefix = (innerTuple->prefixSize > 0);
@@ -2121,7 +2166,7 @@ spgdoinsert(Relation index, SpGistState *state,
memset(&out, 0, sizeof(out));
- if (!isnull)
+ if (!isnull[0])
{
/* use user-defined choose method */
FunctionCall2Coll(procinfo,
@@ -2158,11 +2203,11 @@ spgdoinsert(Relation index, SpGistState *state,
/* Adjust level as per opclass request */
level += out.result.matchNode.levelAdd;
/* Replace leafDatum and recompute leafSize */
- if (!isnull)
+ if (!isnull[0])
{
- leafDatum = out.result.matchNode.restDatum;
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
+ leafDatum[0] = out.result.matchNode.restDatum;
+ leafSize = SpgLeafSize(state, leafDatum, isnull) +
+ sizeof(ItemIdData);
}
/*
@@ -2227,6 +2272,6 @@ spgdoinsert(Relation index, SpGistState *state,
SpGistSetLastUsedPage(index, parent.buffer);
UnlockReleaseBuffer(parent.buffer);
}
-
+ pfree(leafDatum);
return true;
}
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4508a2b92..b54ae85f6e 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -55,8 +55,7 @@ spgistBuildCallback(Relation index, ItemPointer tid, Datum *values,
* lock on some buffer. So we need to be willing to retry. We can flush
* any temp data when retrying.
*/
- while (!spgdoinsert(index, &buildstate->spgstate, tid,
- *values, *isnull))
+ while (!spgdoinsert(index, &buildstate->spgstate, tid, values, isnull))
{
MemoryContextReset(buildstate->tmpCtx);
}
@@ -226,7 +225,7 @@ spginsert(Relation index, Datum *values, bool *isnull,
* to avoid cumulative memory consumption. That means we also have to
* redo initSpGistState(), but it's cheap enough not to matter.
*/
- while (!spgdoinsert(index, &spgstate, ht_ctid, *values, *isnull))
+ while (!spgdoinsert(index, &spgstate, ht_ctid, values, isnull))
{
MemoryContextReset(insertCtx);
initSpGistState(&spgstate, index);
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 4d506bfb9a..b5dedc3afd 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -28,7 +28,8 @@
typedef void (*storeRes_func) (SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isNull, bool recheck,
- bool recheckDistances, double *distances);
+ bool recheckDistances, double *distances,
+ SpGistLeafTuple leafTuple);
/*
* Pairing heap comparison function for the SpGistSearchItem queue.
@@ -88,6 +89,9 @@ spgFreeSearchItem(SpGistScanOpaque so, SpGistSearchItem *item)
if (item->traversalValue)
pfree(item->traversalValue);
+ if (item->isLeaf && item->leafTuple)
+ pfree(item->leafTuple);
+
pfree(item);
}
@@ -134,6 +138,8 @@ spgAddStartItem(SpGistScanOpaque so, bool isnull)
startEntry->recheck = false;
startEntry->recheckDistances = false;
+ startEntry->leafTuple = NULL;
+
spgAddSearchItemToQueue(so, startEntry);
}
@@ -438,14 +444,29 @@ spgendscan(IndexScanDesc scan)
* Leaf SpGistSearchItem constructor, called in queue context
*/
static SpGistSearchItem *
-spgNewHeapItem(SpGistScanOpaque so, int level, ItemPointer heapPtr,
+spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
Datum leafValue, bool recheck, bool recheckDistances,
bool isnull, double *distances)
{
SpGistSearchItem *item = spgAllocSearchItem(so, isnull, distances);
+ /*
+ * If there are include attributes search item in the queue should
+ * contain them.
+ */
+ if (so->state.includeTupdesc)
+ {
+ Assert(so->state.includeTupdesc->natts);
+
+ item->leafTuple = palloc(leafTuple->size);
+ memcpy(item->leafTuple, leafTuple, leafTuple->size);
+ }
+ else
+ {
+ item->leafTuple = NULL;
+ }
item->level = level;
- item->heapPtr = *heapPtr;
+ item->heapPtr = leafTuple->heapPtr;
/* copy value to queue cxt out of tmp cxt */
item->value = isnull ? (Datum) 0 :
datumCopy(leafValue, so->state.attLeafType.attbyval,
@@ -503,6 +524,8 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
in.returnData = so->want_itup;
in.leafDatum = SGLTDATUM(leafTuple, &so->state);
+
+
out.leafValue = (Datum) 0;
out.recheck = false;
out.distances = NULL;
@@ -528,13 +551,12 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
/* the scan is ordered -> add the item to the queue */
MemoryContext oldCxt = MemoryContextSwitchTo(so->traversalCxt);
SpGistSearchItem *heapItem = spgNewHeapItem(so, item->level,
- &leafTuple->heapPtr,
+ leafTuple,
leafValue,
recheck,
recheckDistances,
isnull,
distances);
-
spgAddSearchItemToQueue(so, heapItem);
MemoryContextSwitchTo(oldCxt);
@@ -543,8 +565,10 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
{
/* non-ordered scan, so report the item right away */
Assert(!recheckDistances);
+
storeRes(so, &leafTuple->heapPtr, leafValue, isnull,
- recheck, false, NULL);
+ recheck, false, NULL, leafTuple);
+
*reportedSome = true;
}
}
@@ -736,7 +760,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
/* dead tuple should be first in chain */
Assert(offset == ItemPointerGetOffsetNumber(&item->heapPtr));
/* No live entries on this page */
- Assert(leafTuple->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(leafTuple->nextOffset) == InvalidOffsetNumber);
return SpGistBreakOffsetNumber;
}
}
@@ -750,7 +774,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes);
- return leafTuple->nextOffset;
+ return SGLT_GET_OFFSET(leafTuple->nextOffset);
}
/*
@@ -782,8 +806,8 @@ redirect:
{
/* We store heap items in the queue only in case of ordered search */
Assert(so->numberOfNonNullOrderBys > 0);
- storeRes(so, &item->heapPtr, item->value, item->isNull,
- item->recheck, item->recheckDistances, item->distances);
+ storeRes(so, &item->heapPtr, item->value, item->isNull, item->recheck,
+ item->recheckDistances, item->distances, item->leafTuple);
reportedSome = true;
}
else
@@ -877,7 +901,7 @@ redirect:
static void
storeBitmap(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *distances)
+ double *distances, SpGistLeafTuple leafTuple)
{
Assert(!recheckDistances && !distances);
tbm_add_tuples(so->tbm, heapPtr, 1, recheck);
@@ -904,7 +928,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
static void
storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *nonNullDistances)
+ double *nonNullDistances, SpGistLeafTuple leafTuple)
{
Assert(so->nPtrs < MaxIndexTuplesPerPage);
so->heapPtrs[so->nPtrs] = *heapPtr;
@@ -923,7 +947,7 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
for (i = 0; i < so->numberOfOrderBys; i++)
{
- int offset = so->nonNullOrderByOffsets[i];
+ int offset = so->nonNullOrderByOffsets[i];
if (offset >= 0)
{
@@ -949,9 +973,35 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
* Reconstruct index data. We have to copy the datum out of the temp
* context anyway, so we may as well create the tuple here.
*/
- so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ if (so->state.includeTupdesc)
+ {
+ /* Add included attributes */
+ Datum *leafDatums;
+ bool *leafIsnulls;
+
+ Assert(so->state.includeTupdesc->natts);
+
+ leafDatums = (Datum *) palloc(sizeof(Datum) * (so->state.includeTupdesc->natts + 1));
+ leafIsnulls = (bool *) palloc(sizeof(bool) * (so->state.includeTupdesc->natts + 1));
+
+ SpGistDeformLeafTuple(leafTuple, &so->state, leafDatums, leafIsnulls, isnull);
+
+ /* override key value extracted from LeafTuple in case we've reconstructed it already */
+ leafDatums[0]=leafValue;
+ leafIsnulls[0]=isnull;
+
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ leafDatums,
+ leafIsnulls);
+ pfree(leafDatums);
+ pfree(leafIsnulls);
+ }
+ else
+ {
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
&leafValue,
&isnull);
+ }
}
so->nPtrs++;
}
@@ -1018,6 +1068,9 @@ bool
spgcanreturn(Relation index, int attno)
{
SpGistCache *cache;
+
+ /* Included attributes always can be fetched for index-only scans */
+ if (attno > 1) return true;
/* We can do it if the opclass config function says so */
cache = spgGetCache(index);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 0efe05e552..d052d52c25 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -31,8 +31,18 @@
#include "utils/index_selfuncs.h"
#include "utils/lsyscache.h"
#include "utils/syscache.h"
-
-
+#include "access/itup.h"
+#include "access/detoast.h"
+#include "access/toast_internals.h"
+#include "access/heaptoast.h"
+#include "utils/expandeddatum.h"
+
+/* Does att's datatype allow packing into the 1-byte-header varlena format? */
+#define ATT_IS_PACKABLE(att) \
+ ((att)->attlen == -1 && (att)->attstorage != TYPSTORAGE_PLAIN)
+
+Size spgIncludedDataSize(TupleDesc tupleDesc, Datum *values,
+ bool *isnull, Size start);
/*
* SP-GiST handler function: return IndexAmRoutine with access method parameters
* and callbacks.
@@ -49,7 +59,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amcanorderbyop = true;
amroutine->amcanbackward = false;
amroutine->amcanunique = false;
- amroutine->amcanmulticol = false;
+ amroutine->amcanmulticol = true;
amroutine->amoptionalkey = true;
amroutine->amsearcharray = false;
amroutine->amsearchnulls = true;
@@ -57,7 +67,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amclusterable = false;
amroutine->ampredlocks = false;
amroutine->amcanparallel = false;
- amroutine->amcaninclude = false;
+ amroutine->amcaninclude = true;
amroutine->amusemaintenanceworkmem = false;
amroutine->amparallelvacuumoptions =
VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
@@ -112,18 +122,21 @@ spgGetCache(Relation index)
FmgrInfo *procinfo;
Buffer metabuffer;
SpGistMetaPageData *metadata;
-
cache = MemoryContextAllocZero(index->rd_indexcxt,
sizeof(SpGistCache));
- /* SPGiST doesn't support multi-column indexes */
- Assert(index->rd_att->natts == 1);
+ /* SPGiST should have one key column and can also have included columns */
+ if (IndexRelationGetNumberOfKeyAttributes(index) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("SPGiST index can have only one key column")));
/*
- * Get the actual data type of the indexed column from the index
+ * Get the actual data type of the key column from the index
* tupdesc. We pass this to the opclass config function so that
* polymorphic opclasses are possible.
*/
+
atttype = TupleDescAttr(index->rd_att, 0)->atttypid;
/* Call the config function to get config info for the opclass */
@@ -156,6 +169,7 @@ spgGetCache(Relation index)
fillTypeDesc(&cache->attPrefixType, cache->config.prefixType);
fillTypeDesc(&cache->attLabelType, cache->config.labelType);
+
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
@@ -177,7 +191,22 @@ spgGetCache(Relation index)
/* assume it's up to date */
cache = (SpGistCache *) index->rd_amcache;
}
+ /* Form descriptor for included columns if any */
+ if (IndexRelationGetNumberOfAttributes(index) > 1)
+ {
+ int i;
+ cache->includeTupdesc = CreateTemplateTupleDesc(
+ IndexRelationGetNumberOfAttributes(index) - 1);
+ for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; i++)
+ {
+ TupleDescInitEntry(cache->includeTupdesc, i + 1, NULL,
+ TupleDescAttr(index->rd_att, i + 1)->atttypid,
+ -1, 0);
+ }
+ }
+ else
+ cache->includeTupdesc = NULL;
return cache;
}
@@ -190,6 +219,7 @@ initSpGistState(SpGistState *state, Relation index)
/* Get cached static information about index */
cache = spgGetCache(index);
+ state->includeTupdesc = cache->includeTupdesc;
state->config = cache->config;
state->attType = cache->attType;
state->attLeafType = cache->attLeafType;
@@ -603,7 +633,7 @@ spgoptions(Datum reloptions, bool validate)
/*
* Get the space needed to store a non-null datum of the indicated type.
- * Note the result is already rounded up to a MAXALIGN boundary.
+ * Note the result is not maxaligned and this should be done by caller if needed.
* Also, we follow the SPGiST convention that pass-by-val types are
* just stored in their Datum representation (compare memcpyDatum).
*/
@@ -619,7 +649,7 @@ SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum)
else
size = VARSIZE_ANY(datum);
- return MAXALIGN(size);
+ return size;
}
/*
@@ -642,36 +672,198 @@ memcpyDatum(void *target, SpGistTypeDesc *att, Datum datum)
}
/*
- * Construct a leaf tuple containing the given heap TID and datum value
+ * Private version of heap_compute_data_size with start address not
+ * necessarily MAXALIGNed. The reason is that start address (and alignment)
+ * influence alignment of each of next values and overall size of included
+ * data area in SpGiST leaf tuple.
+ */
+Size
+spgIncludedDataSize(TupleDesc tupleDesc,
+ Datum *values,
+ bool *isnull, Size start)
+{
+ Size data_length = 0;
+ int i;
+ int numberOfAttributes = tupleDesc->natts;
+
+ data_length = start;
+ for (i = 0; i < numberOfAttributes; i++)
+ {
+ Datum val;
+ Form_pg_attribute atti;
+
+ if (isnull[i])
+ continue;
+
+ val = values[i];
+ atti = TupleDescAttr(tupleDesc, i);
+
+ if (ATT_IS_PACKABLE(atti) &&
+ VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
+ {
+ /*
+ * we're anticipating converting to a short varlena header, so
+ * adjust length and don't count any alignment
+ */
+ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
+ }
+ else if (atti->attlen == -1 &&
+ VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+ {
+ /*
+ * we want to flatten the expanded value so that the constructed
+ * tuple doesn't depend on it
+ */
+ data_length = att_align_nominal(data_length, atti->attalign);
+ data_length += EOH_get_flat_size(DatumGetEOHP(val));
+ }
+ else
+ {
+ data_length = att_align_datum(data_length, atti->attalign,
+ atti->attlen, val);
+ data_length = att_addlength_datum(data_length, atti->attlen,
+ val);
+ }
+ }
+ return data_length-start;
+}
+
+/* Calculate overall leaf tuple size. SGLTHDRSZ is MAXALIGNed only for backward
+ * compatibility and there might be gap between header and key data. After key
+ * data there are no such gaps more than is is necessary for each value
+ * alignment. Overall result is MAXALIGNed.*/
+unsigned int SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull)
+{
+ /* compute space needed, nullmask size and offset for include attributes */
+ unsigned int size = SGLTHDRSZ;
+ unsigned int i;
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+ /* nullmask size */
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ size += (state->includeTupdesc->natts / 8) + 1;
+ break;
+ }
+ }
+ /* overall included attributes size each with added proper alignment. */
+ size += spgIncludedDataSize(state->includeTupdesc, datum+1, isnull+1, size);
+ }
+return MAXALIGN(size);
+}
+
+/*
+ * Construct a leaf tuple containing the given heap TID, key data and included
+ * columns data. Key data starts from MAXALIGN boundary for backward compatibility.
+ * Nullmask apply only to included attributes and is placed just after key data if
+ * there is at least one NULL among included attributes. It doesn't need alignment.
+ * Then all included columns data follow aligned by their typealign's.
*/
SpGistLeafTuple
spgFormLeafTuple(SpGistState *state, ItemPointer heapPtr,
- Datum datum, bool isnull)
+ Datum *datum, bool *isnull)
{
SpGistLeafTuple tup;
- unsigned int size;
+ unsigned int size=SGLTHDRSZ;
+ unsigned int include_offset=0;
+ unsigned int nullmask_size=0;
+ unsigned int data_offset=0;
+ unsigned int data_size=0;
+ uint16 tupmask=0;
+ int i;
- /* compute space needed (note result is already maxaligned) */
- size = SGLTHDRSZ;
- if (!isnull)
- size += SpGistGetTypeSize(&state->attLeafType, datum);
+ /*
+ * Calculate space needed. If there are include attributes also calculate sizes and
+ * offsets needed for heap_fill_tuple
+ */
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = size;
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ nullmask_size = (state->includeTupdesc->natts / 8) + 1;
+ size += nullmask_size;
+ break;
+ }
+ }
+
+ /*
+ * Alignment of all included attributes is counted inside data_size. data_offset
+ * itself is not aligned.
+ */
+ data_size = spgIncludedDataSize(state->includeTupdesc, datum+1, isnull+1, size);
+ data_offset=size;
+
+ size += data_size;
+ }
/*
* Ensure that we can replace the tuple with a dead tuple later. This
- * test is unnecessary when !isnull, but let's be safe.
+ * test is unnecessary when !isnull[0], but let's be safe.
*/
if (size < SGDTSIZE)
size = SGDTSIZE;
/* OK, form the tuple */
- tup = (SpGistLeafTuple) palloc0(size);
+ tup = (SpGistLeafTuple) palloc0(MAXALIGN(size));
- tup->size = size;
- tup->nextOffset = InvalidOffsetNumber;
+ tup->size = MAXALIGN(size);
+ SGLT_SET_OFFSET(tup->nextOffset, InvalidOffsetNumber);
tup->heapPtr = *heapPtr;
- if (!isnull)
- memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum);
+ if (!isnull[0])
+ memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum[0]);
+
+ /* Add included columns data to leaf tuple if any. */
+ if (state->includeTupdesc)
+ {
+ /* The start of include attributes tuple is not aligned by default. All values
+ * alignment should be done by heap_fill_tuple automaticaly. If there is a nulls
+ * mask it is included just after key attribute data and it should not be aligned.
+ */
+ heap_fill_tuple(state->includeTupdesc, datum+1, isnull+1,
+ (char *) tup + data_offset,
+ data_size, &tupmask,
+ (nullmask_size ? (bits8 *) tup + include_offset : NULL) );
+
+ if (nullmask_size)
+ SGLT_SET_CONTAINSNULLMASK(tup->nextOffset, 1);
+
+ /*
+ * We do this because heap_fill_tuple wants to initialize a "tupmask"
+ * which is used for HeapTuples, but the only relevant info is the
+ * "has variable attributes" field. We have already set the hasnull
+ * bit above.
+ */
+ if (tupmask & HEAP_HASVARWIDTH)
+ SGLT_SET_CONTAINSVARATT(tup->nextOffset, 1);
+ }
return tup;
}
@@ -688,10 +880,10 @@ spgFormNodeTuple(SpGistState *state, Datum label, bool isnull)
unsigned int size;
unsigned short infomask = 0;
- /* compute space needed (note result is already maxaligned) */
+ /* compute space needed*/
size = SGNTHDRSZ;
if (!isnull)
- size += SpGistGetTypeSize(&state->attLabelType, label);
+ size += MAXALIGN(SpGistGetTypeSize(&state->attLabelType, label));
/*
* Here we make sure that the size will fit in the field reserved for it
@@ -735,7 +927,7 @@ spgFormInnerTuple(SpGistState *state, bool hasPrefix, Datum prefix,
/* Compute size needed */
if (hasPrefix)
- prefixSize = SpGistGetTypeSize(&state->attPrefixType, prefix);
+ prefixSize = MAXALIGN(SpGistGetTypeSize(&state->attPrefixType, prefix));
else
prefixSize = 0;
@@ -1046,3 +1238,128 @@ spgproperty(Oid index_oid, int attno,
return true;
}
+
+/*
+ * Convert an SpGist tuple into palloc'd Datum/isnull arrays.
+ *
+ */
+void
+SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state, Datum *datum, bool *isnull,
+ bool key_isnull)
+{
+ unsigned int include_offset;/* offset of include data */
+ int off;
+ bits8 *nullmask_ptr = NULL; /* ptr to null bitmap in tuple */
+ char *tp;
+ bool slow = false; /* can we use/set attcacheoff? */
+ int i;
+
+ if (key_isnull)
+ {
+ datum[0] = (Datum) 0;
+ isnull[0] = true;
+ }
+ else
+ {
+ datum[0] = SGLTDATUM(tup, state);
+ isnull[0] = false;
+ }
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = key_isnull ? SGLTHDRSZ : SGLTHDRSZ + SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ tp = (char*) tup;
+ off = include_offset;
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ nullmask_ptr = (bits8 *) tp + include_offset;
+ off += (state->includeTupdesc->natts) / 8 + 1;
+ }
+
+ if (state->attLeafType.attlen > 0 && !SGLT_GET_CONTAINSVARATT(tup->nextOffset) &&
+ !SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ /* can use attcacheoff for all attributes */
+ {
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+ isnull[i] = false;
+ if (thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else
+ {
+ off = att_align_nominal(off, thisatt->attalign);
+ thisatt->attcacheoff = off;
+ }
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+ }
+ }
+ else
+ /* general case: can use cache until first null or varlen attribute */
+ {
+ if (state->attLeafType.attlen <= 0)
+ slow = true; /* can't use attcacheoff at all*/
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ if (att_isnull(i - 1, nullmask_ptr))
+ {
+ datum[i] = (Datum) 0;
+ isnull[i] = true;
+ slow = true; /* can't use attcacheoff anymore */
+ continue;
+ }
+ }
+
+ isnull[i] = false;
+
+ if (!slow && thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else if (thisatt->attlen == -1)
+ {
+ /*
+ * We can only cache the offset for a varlena attribute if the
+ * offset is already suitably aligned, so that there would be no
+ * pad bytes in any case: then the offset will be valid for either
+ * an aligned or unaligned value.
+ */
+ if (!slow && off == att_align_nominal(off, thisatt->attalign))
+ thisatt->attcacheoff = off;
+ else
+ {
+ off = att_align_pointer(off, thisatt->attalign, -1, tp + off);
+ slow = true;
+ }
+ }
+ else
+ {
+ /* not varlena, so safe to use att_align_nominal */
+ off = att_align_nominal(off, thisatt->attalign);
+
+ if (!slow)
+ thisatt->attcacheoff = off;
+ }
+
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+
+ if (thisatt->attlen <= 0)
+ slow = true; /* can't use attcacheoff anymore */
+ }
+ }
+ }
+}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index bd98707f3c..a9433f0ad4 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -168,23 +168,25 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
/* Form predecessor map, too */
- if (lt->nextOffset != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) != InvalidOffsetNumber)
{
/* paranoia about corrupted chain links */
- if (lt->nextOffset < FirstOffsetNumber ||
- lt->nextOffset > max ||
- predecessor[lt->nextOffset] != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) < FirstOffsetNumber ||
+ SGLT_GET_OFFSET(lt->nextOffset) > max ||
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] != InvalidOffsetNumber)
elog(ERROR, "inconsistent tuple chain links in page %u of index \"%s\"",
BufferGetBlockNumber(buffer),
RelationGetRelationName(index));
- predecessor[lt->nextOffset] = i;
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] = i;
}
}
else if (lt->tupstate == SPGIST_REDIRECT)
{
SpGistDeadTuple dt = (SpGistDeadTuple) lt;
- Assert(dt->nextOffset == InvalidOffsetNumber);
+ // Dead tuple nextOffset is allowed to have highest bit 0 or 1 in case it is
+ // inherited from SpGistLeafTuple where it has its own meaning.
+ Assert(SGLT_GET_OFFSET(dt->nextOffset) == InvalidOffsetNumber);
Assert(ItemPointerIsValid(&dt->pointer));
/*
@@ -201,7 +203,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
else
{
- Assert(lt->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(lt->nextOffset) == InvalidOffsetNumber);
}
}
@@ -250,7 +252,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
prevLive = deletable[i] ? InvalidOffsetNumber : i;
/* scan down the chain ... */
- j = head->nextOffset;
+ j = SGLT_GET_OFFSET(head->nextOffset);
while (j != InvalidOffsetNumber)
{
SpGistLeafTuple lt;
@@ -301,7 +303,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
interveningDeletable = false;
}
- j = lt->nextOffset;
+ j = SGLT_GET_OFFSET(lt->nextOffset);
}
if (prevLive == InvalidOffsetNumber)
@@ -366,7 +368,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 7be2291d07..4022e3af07 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -122,8 +122,8 @@ spgRedoAddLeaf(XLogReaderState *record)
head = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, xldata->offnumHeadLeaf));
- Assert(head->nextOffset == leafTupleHdr.nextOffset);
- head->nextOffset = xldata->offnumLeaf;
+ Assert(SGLT_GET_OFFSET(head->nextOffset) == SGLT_GET_OFFSET(leafTupleHdr.nextOffset));
+ SGLT_SET_OFFSET(head->nextOffset, xldata->offnumLeaf);
}
}
else
@@ -822,7 +822,7 @@ spgRedoVacuumLeaf(XLogReaderState *record)
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
PageSetLSN(page, lsn);
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index e976201030..514d5e21e4 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2,8 +2,6 @@
*
* elog.c
* error logging and reporting
- *
- * Because of the extremely high rate at which log messages can be generated,
* we need to be mindful of the performance cost of obtaining any information
* that may be logged. Also, it's important to keep in mind that this code may
* get called from within an aborted transaction, in which case operations
@@ -244,6 +242,7 @@ errstart(int elevel, const char *domain)
*/
if (elevel >= ERROR)
{
+ // abort();
/*
* If we are inside a critical section, all errors become PANIC
* errors. See miscadmin.h.
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index 00b98ec6a0..c16ee8c322 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -141,6 +141,7 @@ typedef struct SpGistState
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc; /* tuple descriptor of included columns */
char *deadTupleStorage; /* workspace for spgFormDeadTuple */
@@ -148,6 +149,91 @@ typedef struct SpGistState
bool isBuild; /* true if doing index build */
} SpGistState;
+/*
+ * SPGiST leaf tuple: carries a datum and a heap tuple TID
+ *
+ * In the simplest case, the datum is the same as the indexed value; but
+ * it could also be a suffix or some other sort of delta that permits
+ * reconstruction given knowledge of the prefix path traversed to get here.
+ *
+ * The size field is wider than could possibly be needed for an on-disk leaf
+ * tuple, but this allows us to form leaf tuples even when the datum is too
+ * wide to be stored immediately, and it costs nothing because of alignment
+ * considerations.
+ *
+ * Normally, nextOffset links to the next tuple belonging to the same parent
+ * node (which must be on the same page). But when the root page is a leaf
+ * page, we don't chain its tuples, so nextOffset is always 0 on the root.
+ *
+ * size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
+ * so that the tuple can be converted to REDIRECT status later. (This
+ * restriction only adds bytes for the null-datum case, otherwise alignment
+ * restrictions force it anyway.)
+ *
+ * In a leaf tuple for a NULL indexed value, there's no useful datum value;
+ * however, the SGDTSIZE limit ensures that's there's a Datum word there
+ * anyway, so SGLTDATUM can be applied safely as long as you don't do
+ * anything with the result.
+ *
+ * As SpGistLeafTuple has header of 8 bytes so max value for nextOffset is
+ * (when page size is 65KB) is 8192 and 15 bit is sufficient to store it. So
+ * higher bit is reserved to store information is there nulls mask between leaf
+ * datum and first include value (if any). Size of null mask is 1 byte per each 8
+ * include columns.
+ */
+
+typedef struct SpGistLeafTupleData
+{
+ unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
+ size:30; /* large enough for any palloc'able value */
+ OffsetNumber nextOffset; /* higher 1 bit = 1 if included values has nulls
+ 2 bit = 1 if included values contain variable length values
+ lower 15 bits - nextOffset - points to the next tuple in chain,
+ or InvalidOffsetNumber. They SHOULD NOT be set/read directly,
+ SGLT_SET_OFFSET/SGLT_GET_OFFSET macro must be used instead. */
+ ItemPointerData heapPtr; /* TID of represented heap tuple */
+ /* leaf datum follows */
+ /* if SGLT_GET_CONTAINSNULLMASK nullmask follows. Its size (number of included columns/8)+1 */
+ /* include attributes follow if any*/
+} SpGistLeafTupleData;
+
+typedef SpGistLeafTupleData *SpGistLeafTuple;
+
+#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
+#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
+#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
+ *(Datum *) SGLTDATAPTR(x) : \
+ PointerGetDatum(SGLTDATAPTR(x)))
+/*
+ * Accessor macros for nextOffset and null mask presence bit.
+ * It's a bit of hack that these macros also safely apply to IncludeTupMetadata which has the same
+ * structure. Include tuple size of maximum 13 bits (see INDEX_SIZE_MASK) is stored there instead
+ * of NextOffset which is 14 bits. IncludeTupMetadata is a vehicle to transfer included tuple header
+ * as IncludeTuple is now filled before SpGistLeafTuple initialized.
+ */
+#define SGLT_GET_OFFSET(x) ( (x) & 0x3FFF )
+#define SGLT_GET_CONTAINSNULLMASK(x) ( (x) >> 15 )
+#define SGLT_GET_CONTAINSVARATT(x) ( ( (x) & 4000 ) >> 14 )
+#define SGLT_SET_OFFSET(x,o) ( (x) = ( (x) & 0xC000 ) | ( (o) & 0x3FFF) )
+#define SGLT_SET_CONTAINSNULLMASK(x,n) ( (x) = ( (n) << 15 ) | ( (x) & 0x3FFF ) )
+#define SGLT_SET_CONTAINSVARATT(x,v) ( (x) = ( (v) << 14 ) | ( (x) & 0xBFFF ) )
+
+#define SGLT_GET_INCLUDE_TUPSIZE(x) SGLT_GET_OFFSET(x)
+#define SGLT_SET_INCLUDE_TUPSIZE(x,o) SGLT_SET_OFFSET(x,o)
+
+extern char *SpGistFormIncludeTuple(TupleDesc tupleDescriptor, Datum *values,
+ bool *isnull, uint16 *tupdata);
+/*
+ * SPGiST dead tuple: declaration for examining non-live tuples
+ *
+ * The tupstate field of this struct must match those of regular inner and
+ * leaf tuples, and its size field must match a leaf tuple's.
+ * Also, the pointer field must be in the same place as a leaf tuple's heapPtr
+ * field, to satisfy some Asserts that we make when replacing a leaf tuple
+ * with a dead tuple.
+ * We don't use nextOffset, but it's needed to align the pointer field.
+ */
+
typedef struct SpGistSearchItem
{
pairingheap_node phNode; /* pairing heap node */
@@ -160,14 +246,14 @@ typedef struct SpGistSearchItem
bool isLeaf; /* SearchItem is heap item */
bool recheck; /* qual recheck is needed */
bool recheckDistances; /* distance recheck is needed */
-
+ SpGistLeafTuple leafTuple;
/* array with numberOfOrderBys entries */
double distances[FLEXIBLE_ARRAY_MEMBER];
+ /* if there are include columns SpGistLeafTupleData follow */
} SpGistSearchItem;
#define SizeOfSpGistSearchItem(n_distances) \
(offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
-
/*
* Private state of an index scan
*/
@@ -241,6 +327,7 @@ typedef struct SpGistCache
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc;
SpGistLUPCache lastUsedPages; /* local storage of last-used info */
} SpGistCache;
@@ -321,60 +408,6 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
*(Datum *) SGNTDATAPTR(x) : \
PointerGetDatum(SGNTDATAPTR(x)))
-/*
- * SPGiST leaf tuple: carries a datum and a heap tuple TID
- *
- * In the simplest case, the datum is the same as the indexed value; but
- * it could also be a suffix or some other sort of delta that permits
- * reconstruction given knowledge of the prefix path traversed to get here.
- *
- * The size field is wider than could possibly be needed for an on-disk leaf
- * tuple, but this allows us to form leaf tuples even when the datum is too
- * wide to be stored immediately, and it costs nothing because of alignment
- * considerations.
- *
- * Normally, nextOffset links to the next tuple belonging to the same parent
- * node (which must be on the same page). But when the root page is a leaf
- * page, we don't chain its tuples, so nextOffset is always 0 on the root.
- *
- * size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
- * so that the tuple can be converted to REDIRECT status later. (This
- * restriction only adds bytes for the null-datum case, otherwise alignment
- * restrictions force it anyway.)
- *
- * In a leaf tuple for a NULL indexed value, there's no useful datum value;
- * however, the SGDTSIZE limit ensures that's there's a Datum word there
- * anyway, so SGLTDATUM can be applied safely as long as you don't do
- * anything with the result.
- */
-typedef struct SpGistLeafTupleData
-{
- unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
- size:30; /* large enough for any palloc'able value */
- OffsetNumber nextOffset; /* next tuple in chain, or InvalidOffsetNumber */
- ItemPointerData heapPtr; /* TID of represented heap tuple */
- /* leaf datum follows */
-} SpGistLeafTupleData;
-
-typedef SpGistLeafTupleData *SpGistLeafTuple;
-
-#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
-#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
-#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
- *(Datum *) SGLTDATAPTR(x) : \
- PointerGetDatum(SGLTDATAPTR(x)))
-
-/*
- * SPGiST dead tuple: declaration for examining non-live tuples
- *
- * The tupstate field of this struct must match those of regular inner and
- * leaf tuples, and its size field must match a leaf tuple's.
- * Also, the pointer field must be in the same place as a leaf tuple's heapPtr
- * field, to satisfy some Asserts that we make when replacing a leaf tuple
- * with a dead tuple.
- * We don't use nextOffset, but it's needed to align the pointer field.
- * pointer and xid are only valid when tupstate = REDIRECT.
- */
typedef struct SpGistDeadTupleData
{
unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
@@ -394,7 +427,6 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
* size plus sizeof(ItemIdData) (for the line pointer). This works correctly
* so long as tuple sizes are always maxaligned.
*/
-
/* Page capacity after allowing for fixed header and special space */
#define SPGIST_PAGE_CAPACITY \
MAXALIGN_DOWN(BLCKSZ - \
@@ -456,9 +488,10 @@ extern void SpGistInitPage(Page page, uint16 f);
extern void SpGistInitBuffer(Buffer b, uint16 f);
extern void SpGistInitMetapage(Page page);
extern unsigned int SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum);
+extern unsigned int SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull);
extern SpGistLeafTuple spgFormLeafTuple(SpGistState *state,
ItemPointer heapPtr,
- Datum datum, bool isnull);
+ Datum *datum, bool *isnull);
extern SpGistNodeTuple spgFormNodeTuple(SpGistState *state,
Datum label, bool isnull);
extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
@@ -466,6 +499,8 @@ extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
int nNodes, SpGistNodeTuple *nodes);
extern SpGistDeadTuple spgFormDeadTuple(SpGistState *state, int tupstate,
BlockNumber blkno, OffsetNumber offnum);
+extern void SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state,
+ Datum *datum, bool *isnull, bool key_value_isnull);
extern Datum *spgExtractNodeLabels(SpGistState *state,
SpGistInnerTuple innerTuple);
extern OffsetNumber SpGistPageAddNewItem(SpGistState *state, Page page,
@@ -484,7 +519,7 @@ extern void spgPageIndexMultiDelete(SpGistState *state, Page page,
int firststate, int reststate,
BlockNumber blkno, OffsetNumber offnum);
extern bool spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull);
+ ItemPointer heapPtr, Datum *datum, bool *isnull);
/* spgproc.c */
extern double *spg_key_orderbys_distances(Datum key, bool isLeaf,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index d92a6d12c6..93e6a43b6d 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -169,9 +169,9 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
hash | bogus |
spgist | can_order | f
spgist | can_unique | f
- spgist | can_multi_col | f
+ spgist | can_multi_col | t
spgist | can_exclude | t
- spgist | can_include | f
+ spgist | can_include | t
spgist | bogus |
(36 rows)
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..4fd2b7e878 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -356,7 +356,6 @@ CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "brin" does not support included columns
CREATE INDEX on tbl USING gist(c3) INCLUDE (c1, c4);
CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
-ERROR: access method "spgist" does not support included columns
CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "gin" does not support included columns
CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
diff --git a/src/test/regress/expected/index_including_spgist.out b/src/test/regress/expected/index_including_spgist.out
new file mode 100644
index 0000000000..fa64766fb7
--- /dev/null
+++ b/src/test/regress/expected/index_including_spgist.out
@@ -0,0 +1,139 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+DROP TABLE tbl_spgist;
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+----------
+(0 rows)
+
+DROP TABLE tbl_spgist;
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+ Table "public.tbl_spgist"
+ Column | Type | Collation | Nullable | Default
+--------+---------+-----------+----------+---------
+ c1 | bigint | | |
+ c2 | integer | | |
+ c3 | bigint | | |
+ c4 | box | | |
+Indexes:
+ "tbl_spgist_idx" spgist (c4) INCLUDE (c1, c3)
+
+DROP TABLE tbl_spgist;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..985458a1a8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -50,7 +50,7 @@ test: copy copyselect copydml insert insert_conflict
# ----------
test: create_misc create_operator create_procedure
# These depend on create_misc and create_operator
-test: create_index create_index_spgist create_view index_including index_including_gist
+test: create_index create_index_spgist create_view index_including index_including_gist index_including_spgist
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..f3df961535 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -68,6 +68,7 @@ test: create_index_spgist
test: create_view
test: index_including
test: index_including_gist
+test: index_including_spgist
test: create_aggregate
test: create_function_3
test: create_cast
diff --git a/src/test/regress/sql/index_including_spgist.sql b/src/test/regress/sql/index_including_spgist.sql
new file mode 100644
index 0000000000..a59e73aa22
--- /dev/null
+++ b/src/test/regress/sql/index_including_spgist.sql
@@ -0,0 +1,81 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+DROP TABLE tbl_spgist;
+
Also little bit corrected code formatting.
Show quoted text
Best regards,
Pavel BorisovPostgres Professional: http://postgrespro.com <http://www.postgrespro.com>
Attachments:
spgist-covering-0003.diffapplication/octet-stream; name=spgist-covering-0003.diffDownload
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 934d65b89f..4c133b7106 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -22,7 +22,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
-
+#include "access/htup_details.h"
/*
* SPPageDesc tracks all info about a page we are inserting into. In some
@@ -220,7 +220,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
SpGistBlockIsRoot(current->blkno))
{
/* Tuple is not part of a chain */
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
current->offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -253,7 +253,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
PageGetItemId(current->page, current->offnum));
if (head->tupstate == SPGIST_LIVE)
{
- leafTuple->nextOffset = head->nextOffset;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, SGLT_GET_OFFSET(head->nextOffset));
offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -264,14 +264,14 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
*/
head = (SpGistLeafTuple) PageGetItem(current->page,
PageGetItemId(current->page, current->offnum));
- head->nextOffset = offnum;
+ SGLT_SET_OFFSET(head->nextOffset, offnum);
xlrec.offnumLeaf = offnum;
xlrec.offnumHeadLeaf = current->offnum;
}
else if (head->tupstate == SPGIST_DEAD)
{
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
PageIndexTupleDelete(current->page, current->offnum);
if (PageAddItem(current->page,
(Item) leafTuple, leafTuple->size,
@@ -362,13 +362,13 @@ checkSplitConditions(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* Don't count it in result, because it won't go to other page */
}
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
*nToSplit = n;
@@ -437,7 +437,7 @@ moveLeafs(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* We don't want to move it, so don't count it in size */
toDelete[nDelete] = i;
nDelete++;
@@ -446,7 +446,7 @@ moveLeafs(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
/* Find a leaf page that will hold them */
@@ -475,7 +475,7 @@ moveLeafs(Relation index, SpGistState *state,
* don't care). We're modifying the tuple on the source page
* here, but it's okay since we're about to delete it.
*/
- it->nextOffset = r;
+ SGLT_SET_OFFSET(it->nextOffset, r);
r = SpGistPageAddNewItem(state, npage, (Item) it, it->size,
&startOffset, false);
@@ -490,7 +490,7 @@ moveLeafs(Relation index, SpGistState *state,
}
/* add the new tuple as well */
- newLeafTuple->nextOffset = r;
+ SGLT_SET_OFFSET(newLeafTuple->nextOffset, r);
r = SpGistPageAddNewItem(state, npage,
(Item) newLeafTuple, newLeafTuple->size,
&startOffset, false);
@@ -709,6 +709,9 @@ doPickSplit(Relation index, SpGistState *state,
int nToDelete,
nToInsert,
maxToInclude;
+ Datum *leafChainDatums;
+ bool *leafChainIsnulls;
+ const int natts = IndexRelationGetNumberOfAttributes(index);
in.level = level;
@@ -723,14 +726,16 @@ doPickSplit(Relation index, SpGistState *state,
toInsert = (OffsetNumber *) palloc(sizeof(OffsetNumber) * n);
newLeafs = (SpGistLeafTuple *) palloc(sizeof(SpGistLeafTuple) * n);
leafPageSelect = (uint8 *) palloc(sizeof(uint8) * n);
-
STORE_STATE(state, xlrec.stateSrc);
+ leafChainDatums = (Datum *) palloc(n * natts * sizeof(Datum));
+ leafChainIsnulls = (bool *) palloc(n * natts * sizeof(bool));
+
/*
- * Form list of leaf tuples which will be distributed as split result;
- * also, count up the amount of space that will be freed from current.
- * (Note that in the non-root case, we won't actually delete the old
- * tuples, only replace them with redirects or placeholders.)
+ * Collect leaf tuples which will be distributed as split result; also,
+ * count up the amount of space that will be freed from current. (Note
+ * that in the non-root case, we won't actually delete the old tuples,
+ * only replace them with redirects or placeholders.)
*
* Note: the SGLTDATUM calls here are safe even when dealing with a nulls
* page. For a pass-by-value data type we will fetch a word that must
@@ -738,7 +743,15 @@ doPickSplit(Relation index, SpGistState *state,
* tuples must have size at least SGDTSIZE). For a pass-by-reference type
* we are just computing a pointer that isn't going to get dereferenced.
* So it's not worth guarding the calls with isNulls checks.
+ *
+ * Datums and isnulls of all leaf tuple attributes in a chain are
+ * collected into 2-d arrays: (number of tuples in chain) x (number of
+ * attributes) First attribute is key, the other - included attributes (if
+ * any). After picksplit we need to form new leaf tuples as key attribute
+ * length can change which can affect alignment of every include
+ * attribute.
*/
+
nToInsert = 0;
nToDelete = 0;
spaceToDelete = 0;
@@ -759,6 +772,8 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+ SpGistDeformLeafTuple(it, state, leafChainDatums + nToInsert * natts,
+ leafChainIsnulls + nToInsert * natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -784,6 +799,9 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+
+ SpGistDeformLeafTuple(it, state, leafChainDatums + nToInsert * natts,
+ leafChainIsnulls + nToInsert * natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -795,7 +813,7 @@ doPickSplit(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
toDelete[nToDelete] = i;
nToDelete++;
/* replacing it with redirect will save no space */
@@ -803,7 +821,7 @@ doPickSplit(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
}
in.nTuples = nToInsert;
@@ -816,10 +834,17 @@ doPickSplit(Relation index, SpGistState *state,
*/
in.datums[in.nTuples] = SGLTDATUM(newLeafTuple, state);
heapPtrs[in.nTuples] = newLeafTuple->heapPtr;
+
+ SpGistDeformLeafTuple(newLeafTuple, state, leafChainDatums + (in.nTuples) * natts,
+ leafChainIsnulls + (in.nTuples) * natts, isNulls);
in.nTuples++;
memset(&out, 0, sizeof(out));
+ /*
+ * Process collected key values of tuples from the chain. Included values
+ * are used to build fresh leaf tuples unchanged.
+ */
if (!isNulls)
{
/*
@@ -837,9 +862,11 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- out.leafTupleDatums[i],
- false);
+ *(leafChainDatums + i * natts) = (Datum) out.leafTupleDatums[i];
+ *(leafChainIsnulls + i * natts) = false;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums + i * natts,
+ leafChainIsnulls + i * natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -860,9 +887,14 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- (Datum) 0,
- true);
+ /*
+ * Nulls tree can contain only null key values.
+ */
+ *(leafChainDatums + i * natts) = (Datum) 0;
+ *(leafChainIsnulls + i * natts) = true;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums + i * natts,
+ leafChainIsnulls + i * natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -1196,10 +1228,10 @@ doPickSplit(Relation index, SpGistState *state,
if (ItemPointerIsValid(&nodes[n]->t_tid))
{
Assert(ItemPointerGetBlockNumber(&nodes[n]->t_tid) == leafBlock);
- it->nextOffset = ItemPointerGetOffsetNumber(&nodes[n]->t_tid);
+ SGLT_SET_OFFSET(it->nextOffset, ItemPointerGetOffsetNumber(&nodes[n]->t_tid));
}
else
- it->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(it->nextOffset, InvalidOffsetNumber);
/* Insert it on page */
newoffset = SpGistPageAddNewItem(state, BufferGetPage(leafBuffer),
@@ -1889,67 +1921,83 @@ spgSplitNodeAction(Relation index, SpGistState *state,
*/
bool
spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull)
+ ItemPointer heapPtr, Datum *datum, bool *isnull)
{
int level = 0;
- Datum leafDatum;
+ Datum *leafDatum;
int leafSize;
SPPageDesc current,
parent;
FmgrInfo *procinfo = NULL;
+ int i;
/*
* Look up FmgrInfo of the user-defined choose function once, to save
* cycles in the loop below.
*/
- if (!isnull)
+ if (!isnull[0])
procinfo = index_getprocinfo(index, 1, SPGIST_CHOOSE_PROC);
/*
* Prepare the leaf datum to insert.
- *
+ */
+
+ leafDatum = (Datum *) palloc0(sizeof(Datum) * (IndexRelationGetNumberOfAttributes(index)));
+
+ /*
* If an optional "compress" method is provided, then call it to form the
- * leaf datum from the input datum. Otherwise store the input datum as
- * is. Since we don't use index_form_tuple in this AM, we have to make
- * sure value to be inserted is not toasted; FormIndexDatum doesn't
- * guarantee that. But we assume the "compress" method to return an
- * untoasted value.
+ * key datum from the input datum. Otherwise store the input datum as is.
+ * Since we don't use index_form_tuple in this AM, we have to make sure
+ * value to be inserted is not toasted; FormIndexDatum doesn't guarantee
+ * that. But we assume the "compress" method to return an untoasted
+ * value.
*/
- if (!isnull)
+ if (!isnull[0])
{
if (OidIsValid(index_getprocid(index, 1, SPGIST_COMPRESS_PROC)))
{
FmgrInfo *compressProcinfo = NULL;
compressProcinfo = index_getprocinfo(index, 1, SPGIST_COMPRESS_PROC);
- leafDatum = FunctionCall1Coll(compressProcinfo,
- index->rd_indcollation[0],
- datum);
+ leafDatum[0] = FunctionCall1Coll(compressProcinfo,
+ index->rd_indcollation[0],
+ datum[0]);
}
else
{
Assert(state->attLeafType.type == state->attType.type);
if (state->attType.attlen == -1)
- leafDatum = PointerGetDatum(PG_DETOAST_DATUM(datum));
+ leafDatum[0] = PointerGetDatum(PG_DETOAST_DATUM(datum[0]));
else
- leafDatum = datum;
+ leafDatum[0] = datum[0];
}
}
else
- leafDatum = (Datum) 0;
+ leafDatum[0] = (Datum) 0;
+
+ for (i = 1; i < IndexRelationGetNumberOfAttributes(index); i++)
+ {
+ if (!isnull[i])
+ {
+ if (TupleDescAttr(state->includeTupdesc, i - 1)->attlen == -1)
+ leafDatum[i] = PointerGetDatum(PG_DETOAST_DATUM(datum[i]));
+ else
+ leafDatum[i] = datum[i];
+ }
+ else
+ leafDatum[i] = (Datum) 0;
+ }
+
/*
- * Compute space needed for a leaf tuple containing the given datum.
+ * Compute space needed on a page for a leaf tuple containing the given
+ * datum.
*
* If it isn't gonna fit, and the opclass can't reduce the datum size by
* suffixing, bail out now rather than getting into an endless loop.
*/
- if (!isnull)
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
- else
- leafSize = SGDTSIZE + sizeof(ItemIdData);
+ leafSize = SpgLeafSize(state, leafDatum, isnull) + sizeof(ItemIdData);
if (leafSize > SPGIST_PAGE_CAPACITY && !state->config.longValuesOK)
ereport(ERROR,
@@ -1961,7 +2009,7 @@ spgdoinsert(Relation index, SpGistState *state,
errhint("Values larger than a buffer page cannot be indexed.")));
/* Initialize "current" to the appropriate root page */
- current.blkno = isnull ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
+ current.blkno = isnull[0] ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
current.buffer = InvalidBuffer;
current.page = NULL;
current.offnum = FirstOffsetNumber;
@@ -1995,7 +2043,7 @@ spgdoinsert(Relation index, SpGistState *state,
*/
current.buffer =
SpGistGetBuffer(index,
- GBUF_LEAF | (isnull ? GBUF_NULLS : 0),
+ GBUF_LEAF | (isnull[0] ? GBUF_NULLS : 0),
Min(leafSize, SPGIST_PAGE_CAPACITY),
&isNew);
current.blkno = BufferGetBlockNumber(current.buffer);
@@ -2037,7 +2085,7 @@ spgdoinsert(Relation index, SpGistState *state,
current.page = BufferGetPage(current.buffer);
/* should not arrive at a page of the wrong type */
- if (isnull ? !SpGistPageStoresNulls(current.page) :
+ if (isnull[0] ? !SpGistPageStoresNulls(current.page) :
SpGistPageStoresNulls(current.page))
elog(ERROR, "SPGiST index page %u has wrong nulls flag",
current.blkno);
@@ -2054,7 +2102,7 @@ spgdoinsert(Relation index, SpGistState *state,
{
/* it fits on page, so insert it and we're done */
addLeafTuple(index, state, leafTuple,
- ¤t, &parent, isnull, isNew);
+ ¤t, &parent, isnull[0], isNew);
break;
}
else if ((sizeToSplit =
@@ -2068,14 +2116,14 @@ spgdoinsert(Relation index, SpGistState *state,
* chain to another leaf page rather than splitting it.
*/
Assert(!isNew);
- moveLeafs(index, state, ¤t, &parent, leafTuple, isnull);
+ moveLeafs(index, state, ¤t, &parent, leafTuple, isnull[0]);
break; /* we're done */
}
else
{
/* picksplit */
if (doPickSplit(index, state, ¤t, &parent,
- leafTuple, level, isnull, isNew))
+ leafTuple, level, isnull[0], isNew))
break; /* doPickSplit installed new tuples */
/* leaf tuple will not be inserted yet */
@@ -2110,8 +2158,8 @@ spgdoinsert(Relation index, SpGistState *state,
innerTuple = (SpGistInnerTuple) PageGetItem(current.page,
PageGetItemId(current.page, current.offnum));
- in.datum = datum;
- in.leafDatum = leafDatum;
+ in.datum = datum[0];
+ in.leafDatum = leafDatum[0];
in.level = level;
in.allTheSame = innerTuple->allTheSame;
in.hasPrefix = (innerTuple->prefixSize > 0);
@@ -2121,7 +2169,7 @@ spgdoinsert(Relation index, SpGistState *state,
memset(&out, 0, sizeof(out));
- if (!isnull)
+ if (!isnull[0])
{
/* use user-defined choose method */
FunctionCall2Coll(procinfo,
@@ -2158,11 +2206,11 @@ spgdoinsert(Relation index, SpGistState *state,
/* Adjust level as per opclass request */
level += out.result.matchNode.levelAdd;
/* Replace leafDatum and recompute leafSize */
- if (!isnull)
+ if (!isnull[0])
{
- leafDatum = out.result.matchNode.restDatum;
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
+ leafDatum[0] = out.result.matchNode.restDatum;
+ leafSize = SpgLeafSize(state, leafDatum, isnull) +
+ sizeof(ItemIdData);
}
/*
@@ -2227,6 +2275,6 @@ spgdoinsert(Relation index, SpGistState *state,
SpGistSetLastUsedPage(index, parent.buffer);
UnlockReleaseBuffer(parent.buffer);
}
-
+ pfree(leafDatum);
return true;
}
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4508a2b92..b54ae85f6e 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -55,8 +55,7 @@ spgistBuildCallback(Relation index, ItemPointer tid, Datum *values,
* lock on some buffer. So we need to be willing to retry. We can flush
* any temp data when retrying.
*/
- while (!spgdoinsert(index, &buildstate->spgstate, tid,
- *values, *isnull))
+ while (!spgdoinsert(index, &buildstate->spgstate, tid, values, isnull))
{
MemoryContextReset(buildstate->tmpCtx);
}
@@ -226,7 +225,7 @@ spginsert(Relation index, Datum *values, bool *isnull,
* to avoid cumulative memory consumption. That means we also have to
* redo initSpGistState(), but it's cheap enough not to matter.
*/
- while (!spgdoinsert(index, &spgstate, ht_ctid, *values, *isnull))
+ while (!spgdoinsert(index, &spgstate, ht_ctid, values, isnull))
{
MemoryContextReset(insertCtx);
initSpGistState(&spgstate, index);
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 4d506bfb9a..5a3c7c50cf 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -28,7 +28,8 @@
typedef void (*storeRes_func) (SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isNull, bool recheck,
- bool recheckDistances, double *distances);
+ bool recheckDistances, double *distances,
+ SpGistLeafTuple leafTuple);
/*
* Pairing heap comparison function for the SpGistSearchItem queue.
@@ -88,6 +89,9 @@ spgFreeSearchItem(SpGistScanOpaque so, SpGistSearchItem *item)
if (item->traversalValue)
pfree(item->traversalValue);
+ if (item->isLeaf && item->leafTuple)
+ pfree(item->leafTuple);
+
pfree(item);
}
@@ -134,6 +138,8 @@ spgAddStartItem(SpGistScanOpaque so, bool isnull)
startEntry->recheck = false;
startEntry->recheckDistances = false;
+ startEntry->leafTuple = NULL;
+
spgAddSearchItemToQueue(so, startEntry);
}
@@ -438,14 +444,30 @@ spgendscan(IndexScanDesc scan)
* Leaf SpGistSearchItem constructor, called in queue context
*/
static SpGistSearchItem *
-spgNewHeapItem(SpGistScanOpaque so, int level, ItemPointer heapPtr,
+spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
Datum leafValue, bool recheck, bool recheckDistances,
bool isnull, double *distances)
{
SpGistSearchItem *item = spgAllocSearchItem(so, isnull, distances);
+ /*
+ * If there are include attributes search item in the queue should contain
+ * them.
+ */
+ if (so->state.includeTupdesc)
+ {
+ Assert(so->state.includeTupdesc->natts);
+
+ item->leafTuple = palloc(leafTuple->size);
+ memcpy(item->leafTuple, leafTuple, leafTuple->size);
+ }
+ else
+ {
+ item->leafTuple = NULL;
+ }
+
item->level = level;
- item->heapPtr = *heapPtr;
+ item->heapPtr = leafTuple->heapPtr;
/* copy value to queue cxt out of tmp cxt */
item->value = isnull ? (Datum) 0 :
datumCopy(leafValue, so->state.attLeafType.attbyval,
@@ -503,6 +525,8 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
in.returnData = so->want_itup;
in.leafDatum = SGLTDATUM(leafTuple, &so->state);
+
+
out.leafValue = (Datum) 0;
out.recheck = false;
out.distances = NULL;
@@ -528,7 +552,7 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
/* the scan is ordered -> add the item to the queue */
MemoryContext oldCxt = MemoryContextSwitchTo(so->traversalCxt);
SpGistSearchItem *heapItem = spgNewHeapItem(so, item->level,
- &leafTuple->heapPtr,
+ leafTuple,
leafValue,
recheck,
recheckDistances,
@@ -543,8 +567,10 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
{
/* non-ordered scan, so report the item right away */
Assert(!recheckDistances);
+
storeRes(so, &leafTuple->heapPtr, leafValue, isnull,
- recheck, false, NULL);
+ recheck, false, NULL, leafTuple);
+
*reportedSome = true;
}
}
@@ -736,7 +762,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
/* dead tuple should be first in chain */
Assert(offset == ItemPointerGetOffsetNumber(&item->heapPtr));
/* No live entries on this page */
- Assert(leafTuple->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(leafTuple->nextOffset) == InvalidOffsetNumber);
return SpGistBreakOffsetNumber;
}
}
@@ -750,7 +776,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes);
- return leafTuple->nextOffset;
+ return SGLT_GET_OFFSET(leafTuple->nextOffset);
}
/*
@@ -782,8 +808,8 @@ redirect:
{
/* We store heap items in the queue only in case of ordered search */
Assert(so->numberOfNonNullOrderBys > 0);
- storeRes(so, &item->heapPtr, item->value, item->isNull,
- item->recheck, item->recheckDistances, item->distances);
+ storeRes(so, &item->heapPtr, item->value, item->isNull, item->recheck,
+ item->recheckDistances, item->distances, item->leafTuple);
reportedSome = true;
}
else
@@ -877,7 +903,7 @@ redirect:
static void
storeBitmap(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *distances)
+ double *distances, SpGistLeafTuple leafTuple)
{
Assert(!recheckDistances && !distances);
tbm_add_tuples(so->tbm, heapPtr, 1, recheck);
@@ -904,7 +930,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
static void
storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *nonNullDistances)
+ double *nonNullDistances, SpGistLeafTuple leafTuple)
{
Assert(so->nPtrs < MaxIndexTuplesPerPage);
so->heapPtrs[so->nPtrs] = *heapPtr;
@@ -949,9 +975,38 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
* Reconstruct index data. We have to copy the datum out of the temp
* context anyway, so we may as well create the tuple here.
*/
- so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
- &leafValue,
- &isnull);
+ if (so->state.includeTupdesc)
+ {
+ /* Add included attributes */
+ Datum *leafDatums;
+ bool *leafIsnulls;
+
+ Assert(so->state.includeTupdesc->natts);
+
+ leafDatums = (Datum *) palloc(sizeof(Datum) * (so->state.includeTupdesc->natts + 1));
+ leafIsnulls = (bool *) palloc(sizeof(bool) * (so->state.includeTupdesc->natts + 1));
+
+ SpGistDeformLeafTuple(leafTuple, &so->state, leafDatums, leafIsnulls, isnull);
+
+ /*
+ * override key value extracted from LeafTuple in case we've
+ * reconstructed it already
+ */
+ leafDatums[0] = leafValue;
+ leafIsnulls[0] = isnull;
+
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ leafDatums,
+ leafIsnulls);
+ pfree(leafDatums);
+ pfree(leafIsnulls);
+ }
+ else
+ {
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ &leafValue,
+ &isnull);
+ }
}
so->nPtrs++;
}
@@ -1019,6 +1074,10 @@ spgcanreturn(Relation index, int attno)
{
SpGistCache *cache;
+ /* Included attributes always can be fetched for index-only scans */
+ if (attno > 1)
+ return true;
+
/* We can do it if the opclass config function says so */
cache = spgGetCache(index);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 0efe05e552..3ca47ff53d 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -31,7 +31,18 @@
#include "utils/index_selfuncs.h"
#include "utils/lsyscache.h"
#include "utils/syscache.h"
+#include "access/itup.h"
+#include "access/detoast.h"
+#include "access/toast_internals.h"
+#include "access/heaptoast.h"
+#include "utils/expandeddatum.h"
+/* Does att's datatype allow packing into the 1-byte-header varlena format? */
+#define ATT_IS_PACKABLE(att) \
+ ((att)->attlen == -1 && (att)->attstorage != TYPSTORAGE_PLAIN)
+
+Size spgIncludedDataSize(TupleDesc tupleDesc, Datum *values,
+ bool *isnull, Size start);
/*
* SP-GiST handler function: return IndexAmRoutine with access method parameters
@@ -49,7 +60,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amcanorderbyop = true;
amroutine->amcanbackward = false;
amroutine->amcanunique = false;
- amroutine->amcanmulticol = false;
+ amroutine->amcanmulticol = true;
amroutine->amoptionalkey = true;
amroutine->amsearcharray = false;
amroutine->amsearchnulls = true;
@@ -57,7 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amclusterable = false;
amroutine->ampredlocks = false;
amroutine->amcanparallel = false;
- amroutine->amcaninclude = false;
+ amroutine->amcaninclude = true;
amroutine->amusemaintenanceworkmem = false;
amroutine->amparallelvacuumoptions =
VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
@@ -116,14 +127,21 @@ spgGetCache(Relation index)
cache = MemoryContextAllocZero(index->rd_indexcxt,
sizeof(SpGistCache));
- /* SPGiST doesn't support multi-column indexes */
- Assert(index->rd_att->natts == 1);
+ /*
+ * SPGiST should have one key column and can also have included
+ * columns
+ */
+ if (IndexRelationGetNumberOfKeyAttributes(index) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("SPGiST index can have only one key column")));
/*
- * Get the actual data type of the indexed column from the index
- * tupdesc. We pass this to the opclass config function so that
- * polymorphic opclasses are possible.
+ * Get the actual data type of the key column from the index tupdesc.
+ * We pass this to the opclass config function so that polymorphic
+ * opclasses are possible.
*/
+
atttype = TupleDescAttr(index->rd_att, 0)->atttypid;
/* Call the config function to get config info for the opclass */
@@ -156,6 +174,7 @@ spgGetCache(Relation index)
fillTypeDesc(&cache->attPrefixType, cache->config.prefixType);
fillTypeDesc(&cache->attLabelType, cache->config.labelType);
+
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
@@ -177,7 +196,23 @@ spgGetCache(Relation index)
/* assume it's up to date */
cache = (SpGistCache *) index->rd_amcache;
}
+ /* Form descriptor for included columns if any */
+ if (IndexRelationGetNumberOfAttributes(index) > 1)
+ {
+ int i;
+
+ cache->includeTupdesc = CreateTemplateTupleDesc(
+ IndexRelationGetNumberOfAttributes(index) - 1);
+ for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; i++)
+ {
+ TupleDescInitEntry(cache->includeTupdesc, i + 1, NULL,
+ TupleDescAttr(index->rd_att, i + 1)->atttypid,
+ -1, 0);
+ }
+ }
+ else
+ cache->includeTupdesc = NULL;
return cache;
}
@@ -190,6 +225,7 @@ initSpGistState(SpGistState *state, Relation index)
/* Get cached static information about index */
cache = spgGetCache(index);
+ state->includeTupdesc = cache->includeTupdesc;
state->config = cache->config;
state->attType = cache->attType;
state->attLeafType = cache->attLeafType;
@@ -603,7 +639,7 @@ spgoptions(Datum reloptions, bool validate)
/*
* Get the space needed to store a non-null datum of the indicated type.
- * Note the result is already rounded up to a MAXALIGN boundary.
+ * Note the result is not maxaligned and this should be done by caller if needed.
* Also, we follow the SPGiST convention that pass-by-val types are
* just stored in their Datum representation (compare memcpyDatum).
*/
@@ -619,7 +655,7 @@ SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum)
else
size = VARSIZE_ANY(datum);
- return MAXALIGN(size);
+ return size;
}
/*
@@ -642,36 +678,202 @@ memcpyDatum(void *target, SpGistTypeDesc *att, Datum datum)
}
/*
- * Construct a leaf tuple containing the given heap TID and datum value
+ * Private version of heap_compute_data_size with start address not
+ * necessarily MAXALIGNed. The reason is that start address (and alignment)
+ * influence alignment of each of next values and overall size of included
+ * data area in SpGiST leaf tuple.
+ */
+Size
+spgIncludedDataSize(TupleDesc tupleDesc,
+ Datum *values,
+ bool *isnull, Size start)
+{
+ Size data_length = 0;
+ int i;
+ int numberOfAttributes = tupleDesc->natts;
+
+ data_length = start;
+ for (i = 0; i < numberOfAttributes; i++)
+ {
+ Datum val;
+ Form_pg_attribute atti;
+
+ if (isnull[i])
+ continue;
+
+ val = values[i];
+ atti = TupleDescAttr(tupleDesc, i);
+
+ if (ATT_IS_PACKABLE(atti) &&
+ VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
+ {
+ /*
+ * we're anticipating converting to a short varlena header, so
+ * adjust length and don't count any alignment
+ */
+ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
+ }
+ else if (atti->attlen == -1 &&
+ VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+ {
+ /*
+ * we want to flatten the expanded value so that the constructed
+ * tuple doesn't depend on it
+ */
+ data_length = att_align_nominal(data_length, atti->attalign);
+ data_length += EOH_get_flat_size(DatumGetEOHP(val));
+ }
+ else
+ {
+ data_length = att_align_datum(data_length, atti->attalign,
+ atti->attlen, val);
+ data_length = att_addlength_datum(data_length, atti->attlen,
+ val);
+ }
+ }
+ return data_length - start;
+}
+
+/* Calculate overall leaf tuple size. SGLTHDRSZ is MAXALIGNed only for backward
+ * compatibility and there might be gap between header and key data. After key
+ * data there are no such gaps more than is is necessary for each value
+ * alignment. Overall result is MAXALIGNed.*/
+unsigned int
+SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull)
+{
+ /* compute space needed, nullmask size and offset for include attributes */
+ unsigned int size = SGLTHDRSZ;
+ unsigned int i;
+
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+ /* nullmask size */
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ size += (state->includeTupdesc->natts / 8) + 1;
+ break;
+ }
+ }
+ /* overall included attributes size each with added proper alignment. */
+ size += spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ }
+ return MAXALIGN(size);
+}
+
+/*
+ * Construct a leaf tuple containing the given heap TID, key data and included
+ * columns data. Key data starts from MAXALIGN boundary for backward compatibility.
+ * Nullmask apply only to included attributes and is placed just after key data if
+ * there is at least one NULL among included attributes. It doesn't need alignment.
+ * Then all included columns data follow aligned by their typealign's.
*/
SpGistLeafTuple
spgFormLeafTuple(SpGistState *state, ItemPointer heapPtr,
- Datum datum, bool isnull)
+ Datum *datum, bool *isnull)
{
SpGistLeafTuple tup;
- unsigned int size;
+ unsigned int size = SGLTHDRSZ;
+ unsigned int include_offset = 0;
+ unsigned int nullmask_size = 0;
+ unsigned int data_offset = 0;
+ unsigned int data_size = 0;
+ uint16 tupmask = 0;
+ int i;
- /* compute space needed (note result is already maxaligned) */
- size = SGLTHDRSZ;
- if (!isnull)
- size += SpGistGetTypeSize(&state->attLeafType, datum);
+ /*
+ * Calculate space needed. If there are include attributes also calculate
+ * sizes and offsets needed for heap_fill_tuple
+ */
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = size;
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ nullmask_size = (state->includeTupdesc->natts / 8) + 1;
+ size += nullmask_size;
+ break;
+ }
+ }
+
+ /*
+ * Alignment of all included attributes is counted inside data_size.
+ * data_offset itself is not aligned.
+ */
+ data_size = spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ data_offset = size;
+
+ size += data_size;
+ }
/*
* Ensure that we can replace the tuple with a dead tuple later. This
- * test is unnecessary when !isnull, but let's be safe.
+ * test is unnecessary when !isnull[0], but let's be safe.
*/
if (size < SGDTSIZE)
size = SGDTSIZE;
/* OK, form the tuple */
- tup = (SpGistLeafTuple) palloc0(size);
+ tup = (SpGistLeafTuple) palloc0(MAXALIGN(size));
- tup->size = size;
- tup->nextOffset = InvalidOffsetNumber;
+ tup->size = MAXALIGN(size);
+ SGLT_SET_OFFSET(tup->nextOffset, InvalidOffsetNumber);
tup->heapPtr = *heapPtr;
- if (!isnull)
- memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum);
+ if (!isnull[0])
+ memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum[0]);
+
+ /* Add included columns data to leaf tuple if any. */
+ if (state->includeTupdesc)
+ {
+ /*
+ * The start of include attributes tuple is not aligned by default.
+ * All values alignment should be done by heap_fill_tuple
+ * automaticaly. If there is a nulls mask it is included just after
+ * key attribute data and it should not be aligned.
+ */
+ heap_fill_tuple(state->includeTupdesc, datum + 1, isnull + 1,
+ (char *) tup + data_offset,
+ data_size, &tupmask,
+ (nullmask_size ? (bits8 *) tup + include_offset : NULL));
+
+ if (nullmask_size)
+ SGLT_SET_CONTAINSNULLMASK(tup->nextOffset, 1);
+
+ /*
+ * We do this because heap_fill_tuple wants to initialize a "tupmask"
+ * which is used for HeapTuples, but the only relevant info is the
+ * "has variable attributes" field. We have already set the hasnull
+ * bit above.
+ */
+ if (tupmask & HEAP_HASVARWIDTH)
+ SGLT_SET_CONTAINSVARATT(tup->nextOffset, 1);
+ }
return tup;
}
@@ -688,10 +890,10 @@ spgFormNodeTuple(SpGistState *state, Datum label, bool isnull)
unsigned int size;
unsigned short infomask = 0;
- /* compute space needed (note result is already maxaligned) */
+ /* compute space needed */
size = SGNTHDRSZ;
if (!isnull)
- size += SpGistGetTypeSize(&state->attLabelType, label);
+ size += MAXALIGN(SpGistGetTypeSize(&state->attLabelType, label));
/*
* Here we make sure that the size will fit in the field reserved for it
@@ -735,7 +937,7 @@ spgFormInnerTuple(SpGistState *state, bool hasPrefix, Datum prefix,
/* Compute size needed */
if (hasPrefix)
- prefixSize = SpGistGetTypeSize(&state->attPrefixType, prefix);
+ prefixSize = MAXALIGN(SpGistGetTypeSize(&state->attPrefixType, prefix));
else
prefixSize = 0;
@@ -1046,3 +1248,133 @@ spgproperty(Oid index_oid, int attno,
return true;
}
+
+/*
+ * Convert an SpGist tuple into palloc'd Datum/isnull arrays.
+ *
+ */
+void
+SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state, Datum *datum, bool *isnull,
+ bool key_isnull)
+{
+ unsigned int include_offset; /* offset of include data */
+ int off;
+ bits8 *nullmask_ptr = NULL; /* ptr to null bitmap in tuple */
+ char *tp;
+ bool slow = false; /* can we use/set attcacheoff? */
+ int i;
+
+ if (key_isnull)
+ {
+ datum[0] = (Datum) 0;
+ isnull[0] = true;
+ }
+ else
+ {
+ datum[0] = SGLTDATUM(tup, state);
+ isnull[0] = false;
+ }
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = key_isnull ? SGLTHDRSZ : SGLTHDRSZ + SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ tp = (char *) tup;
+ off = include_offset;
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ nullmask_ptr = (bits8 *) tp + include_offset;
+ off += (state->includeTupdesc->natts) / 8 + 1;
+ }
+
+ if (state->attLeafType.attlen > 0 && !SGLT_GET_CONTAINSVARATT(tup->nextOffset) &&
+ !SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ /* can use attcacheoff for all attributes */
+ {
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ isnull[i] = false;
+ if (thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else
+ {
+ off = att_align_nominal(off, thisatt->attalign);
+ thisatt->attcacheoff = off;
+ }
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+ }
+ }
+ else
+
+ /*
+ * general case: can use cache until first null or varlen
+ * attribute
+ */
+ {
+ if (state->attLeafType.attlen <= 0)
+ slow = true; /* can't use attcacheoff at all */
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ if (att_isnull(i - 1, nullmask_ptr))
+ {
+ datum[i] = (Datum) 0;
+ isnull[i] = true;
+ slow = true; /* can't use attcacheoff anymore */
+ continue;
+ }
+ }
+
+ isnull[i] = false;
+
+ if (!slow && thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else if (thisatt->attlen == -1)
+ {
+ /*
+ * We can only cache the offset for a varlena attribute if
+ * the offset is already suitably aligned, so that there
+ * would be no pad bytes in any case: then the offset will
+ * be valid for either an aligned or unaligned value.
+ */
+ if (!slow && off == att_align_nominal(off, thisatt->attalign))
+ thisatt->attcacheoff = off;
+ else
+ {
+ off = att_align_pointer(off, thisatt->attalign, -1, tp + off);
+ slow = true;
+ }
+ }
+ else
+ {
+ /* not varlena, so safe to use att_align_nominal */
+ off = att_align_nominal(off, thisatt->attalign);
+
+ if (!slow)
+ thisatt->attcacheoff = off;
+ }
+
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+
+ if (thisatt->attlen <= 0)
+ slow = true; /* can't use attcacheoff anymore */
+ }
+ }
+ }
+}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index bd98707f3c..a0d76901fc 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -168,23 +168,28 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
/* Form predecessor map, too */
- if (lt->nextOffset != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) != InvalidOffsetNumber)
{
/* paranoia about corrupted chain links */
- if (lt->nextOffset < FirstOffsetNumber ||
- lt->nextOffset > max ||
- predecessor[lt->nextOffset] != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) < FirstOffsetNumber ||
+ SGLT_GET_OFFSET(lt->nextOffset) > max ||
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] != InvalidOffsetNumber)
elog(ERROR, "inconsistent tuple chain links in page %u of index \"%s\"",
BufferGetBlockNumber(buffer),
RelationGetRelationName(index));
- predecessor[lt->nextOffset] = i;
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] = i;
}
}
else if (lt->tupstate == SPGIST_REDIRECT)
{
SpGistDeadTuple dt = (SpGistDeadTuple) lt;
- Assert(dt->nextOffset == InvalidOffsetNumber);
+ /*
+ * Dead tuple nextOffset is allowed to have any values of two
+ * highest bits in case it is inherited from SpGistLeafTuple where
+ * these bits has their own meaning.
+ */
+ Assert(SGLT_GET_OFFSET(dt->nextOffset) == InvalidOffsetNumber);
Assert(ItemPointerIsValid(&dt->pointer));
/*
@@ -201,7 +206,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
else
{
- Assert(lt->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(lt->nextOffset) == InvalidOffsetNumber);
}
}
@@ -250,7 +255,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
prevLive = deletable[i] ? InvalidOffsetNumber : i;
/* scan down the chain ... */
- j = head->nextOffset;
+ j = SGLT_GET_OFFSET(head->nextOffset);
while (j != InvalidOffsetNumber)
{
SpGistLeafTuple lt;
@@ -301,7 +306,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
interveningDeletable = false;
}
- j = lt->nextOffset;
+ j = SGLT_GET_OFFSET(lt->nextOffset);
}
if (prevLive == InvalidOffsetNumber)
@@ -366,7 +371,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 7be2291d07..4022e3af07 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -122,8 +122,8 @@ spgRedoAddLeaf(XLogReaderState *record)
head = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, xldata->offnumHeadLeaf));
- Assert(head->nextOffset == leafTupleHdr.nextOffset);
- head->nextOffset = xldata->offnumLeaf;
+ Assert(SGLT_GET_OFFSET(head->nextOffset) == SGLT_GET_OFFSET(leafTupleHdr.nextOffset));
+ SGLT_SET_OFFSET(head->nextOffset, xldata->offnumLeaf);
}
}
else
@@ -822,7 +822,7 @@ spgRedoVacuumLeaf(XLogReaderState *record)
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
PageSetLSN(page, lsn);
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index 00b98ec6a0..8d03adb8f5 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -141,6 +141,7 @@ typedef struct SpGistState
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc; /* tuple descriptor of included columns */
char *deadTupleStorage; /* workspace for spgFormDeadTuple */
@@ -148,6 +149,98 @@ typedef struct SpGistState
bool isBuild; /* true if doing index build */
} SpGistState;
+/*
+ * SPGiST leaf tuple: carries a datum and a heap tuple TID
+ *
+ * In the simplest case, the datum is the same as the indexed value; but
+ * it could also be a suffix or some other sort of delta that permits
+ * reconstruction given knowledge of the prefix path traversed to get here.
+ *
+ * The size field is wider than could possibly be needed for an on-disk leaf
+ * tuple, but this allows us to form leaf tuples even when the datum is too
+ * wide to be stored immediately, and it costs nothing because of alignment
+ * considerations.
+ *
+ * Normally, nextOffset links to the next tuple belonging to the same parent
+ * node (which must be on the same page). But when the root page is a leaf
+ * page, we don't chain its tuples, so nextOffset is always 0 on the root.
+ *
+ * size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
+ * so that the tuple can be converted to REDIRECT status later. (This
+ * restriction only adds bytes for the null-datum case, otherwise alignment
+ * restrictions force it anyway.)
+ *
+ * In a leaf tuple for a NULL indexed value, there's no useful datum value;
+ * however, the SGDTSIZE limit ensures that's there's a Datum word there
+ * anyway, so SGLTDATUM can be applied safely as long as you don't do
+ * anything with the result.
+ *
+ * Minimum space to store SpGistLeafTuple on a page is 12 bytes tuple header
+ * and 4 bytes ItemIdData so 14 lower bits of nextOffset (accessed as
+ * SGLT_GET/SET_OFFSET) is enough to store actual tuple number on a page even
+ * if page size is 64Kb. Two higher bits are to store per-tuple
+ * information is there nulls mask exist and is there any included attribute
+ * of variable length type.
+ */
+
+typedef struct SpGistLeafTupleData
+{
+ unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
+ size:30; /* large enough for any palloc'able value */
+ OffsetNumber nextOffset; /* higher 1 bit = 1 if included values has
+ * nulls, 2 bit = 1 if included values contain
+ * variable length values, lower 15 bits - is
+ * "actual" nextOffset i.e. number of next
+ * tuple in chain on a page, or
+ * InvalidOffsetNumber. They SHOULD NOT be
+ * set/read directly,
+ * SGLT_SET_XXX/SGLT_GET_XXX macros must be
+ * used instead. */
+ ItemPointerData heapPtr; /* TID of represented heap tuple */
+ /* leaf datum follows */
+
+ /*
+ * if SGLT_GET_CONTAINSNULLMASK nullmask follows. Its size (number of
+ * included columns/8)+1
+ */
+ /* include attributes follow if any */
+} SpGistLeafTupleData;
+
+typedef SpGistLeafTupleData *SpGistLeafTuple;
+
+#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
+#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
+#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
+ *(Datum *) SGLTDATAPTR(x) : \
+ PointerGetDatum(SGLTDATAPTR(x)))
+/*
+ * Accessor macros to get and set actual 14-bit offset and two bit flags from/to
+ * nextOffset value.
+ */
+#define SGLT_GET_OFFSET(x) ( (x) & 0x3FFF )
+#define SGLT_GET_CONTAINSNULLMASK(x) ( (x) >> 15 )
+#define SGLT_GET_CONTAINSVARATT(x) ( ( (x) & 4000 ) >> 14 )
+#define SGLT_SET_OFFSET(x,o) ( (x) = ( (x) & 0xC000 ) | ( (o) & 0x3FFF) )
+#define SGLT_SET_CONTAINSNULLMASK(x,n) ( (x) = ( (n) << 15 ) | ( (x) & 0x3FFF ) )
+#define SGLT_SET_CONTAINSVARATT(x,v) ( (x) = ( (v) << 14 ) | ( (x) & 0xBFFF ) )
+
+#define SGLT_GET_INCLUDE_TUPSIZE(x) SGLT_GET_OFFSET(x)
+#define SGLT_SET_INCLUDE_TUPSIZE(x,o) SGLT_SET_OFFSET(x,o)
+
+extern char *SpGistFormIncludeTuple(TupleDesc tupleDescriptor, Datum *values,
+ bool *isnull, uint16 *tupdata);
+
+/*
+ * SPGiST dead tuple: declaration for examining non-live tuples
+ *
+ * The tupstate field of this struct must match those of regular inner and
+ * leaf tuples, and its size field must match a leaf tuple's.
+ * Also, the pointer field must be in the same place as a leaf tuple's heapPtr
+ * field, to satisfy some Asserts that we make when replacing a leaf tuple
+ * with a dead tuple.
+ * We don't use nextOffset, but it's needed to align the pointer field.
+ */
+
typedef struct SpGistSearchItem
{
pairingheap_node phNode; /* pairing heap node */
@@ -160,14 +253,14 @@ typedef struct SpGistSearchItem
bool isLeaf; /* SearchItem is heap item */
bool recheck; /* qual recheck is needed */
bool recheckDistances; /* distance recheck is needed */
-
+ SpGistLeafTuple leafTuple;
/* array with numberOfOrderBys entries */
double distances[FLEXIBLE_ARRAY_MEMBER];
+ /* if there are include columns SpGistLeafTupleData follow */
} SpGistSearchItem;
#define SizeOfSpGistSearchItem(n_distances) \
(offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
-
/*
* Private state of an index scan
*/
@@ -241,6 +334,7 @@ typedef struct SpGistCache
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc;
SpGistLUPCache lastUsedPages; /* local storage of last-used info */
} SpGistCache;
@@ -321,60 +415,6 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
*(Datum *) SGNTDATAPTR(x) : \
PointerGetDatum(SGNTDATAPTR(x)))
-/*
- * SPGiST leaf tuple: carries a datum and a heap tuple TID
- *
- * In the simplest case, the datum is the same as the indexed value; but
- * it could also be a suffix or some other sort of delta that permits
- * reconstruction given knowledge of the prefix path traversed to get here.
- *
- * The size field is wider than could possibly be needed for an on-disk leaf
- * tuple, but this allows us to form leaf tuples even when the datum is too
- * wide to be stored immediately, and it costs nothing because of alignment
- * considerations.
- *
- * Normally, nextOffset links to the next tuple belonging to the same parent
- * node (which must be on the same page). But when the root page is a leaf
- * page, we don't chain its tuples, so nextOffset is always 0 on the root.
- *
- * size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
- * so that the tuple can be converted to REDIRECT status later. (This
- * restriction only adds bytes for the null-datum case, otherwise alignment
- * restrictions force it anyway.)
- *
- * In a leaf tuple for a NULL indexed value, there's no useful datum value;
- * however, the SGDTSIZE limit ensures that's there's a Datum word there
- * anyway, so SGLTDATUM can be applied safely as long as you don't do
- * anything with the result.
- */
-typedef struct SpGistLeafTupleData
-{
- unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
- size:30; /* large enough for any palloc'able value */
- OffsetNumber nextOffset; /* next tuple in chain, or InvalidOffsetNumber */
- ItemPointerData heapPtr; /* TID of represented heap tuple */
- /* leaf datum follows */
-} SpGistLeafTupleData;
-
-typedef SpGistLeafTupleData *SpGistLeafTuple;
-
-#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
-#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
-#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
- *(Datum *) SGLTDATAPTR(x) : \
- PointerGetDatum(SGLTDATAPTR(x)))
-
-/*
- * SPGiST dead tuple: declaration for examining non-live tuples
- *
- * The tupstate field of this struct must match those of regular inner and
- * leaf tuples, and its size field must match a leaf tuple's.
- * Also, the pointer field must be in the same place as a leaf tuple's heapPtr
- * field, to satisfy some Asserts that we make when replacing a leaf tuple
- * with a dead tuple.
- * We don't use nextOffset, but it's needed to align the pointer field.
- * pointer and xid are only valid when tupstate = REDIRECT.
- */
typedef struct SpGistDeadTupleData
{
unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
@@ -394,7 +434,6 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
* size plus sizeof(ItemIdData) (for the line pointer). This works correctly
* so long as tuple sizes are always maxaligned.
*/
-
/* Page capacity after allowing for fixed header and special space */
#define SPGIST_PAGE_CAPACITY \
MAXALIGN_DOWN(BLCKSZ - \
@@ -456,9 +495,10 @@ extern void SpGistInitPage(Page page, uint16 f);
extern void SpGistInitBuffer(Buffer b, uint16 f);
extern void SpGistInitMetapage(Page page);
extern unsigned int SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum);
+extern unsigned int SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull);
extern SpGistLeafTuple spgFormLeafTuple(SpGistState *state,
ItemPointer heapPtr,
- Datum datum, bool isnull);
+ Datum *datum, bool *isnull);
extern SpGistNodeTuple spgFormNodeTuple(SpGistState *state,
Datum label, bool isnull);
extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
@@ -466,6 +506,8 @@ extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
int nNodes, SpGistNodeTuple *nodes);
extern SpGistDeadTuple spgFormDeadTuple(SpGistState *state, int tupstate,
BlockNumber blkno, OffsetNumber offnum);
+extern void SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state,
+ Datum *datum, bool *isnull, bool key_value_isnull);
extern Datum *spgExtractNodeLabels(SpGistState *state,
SpGistInnerTuple innerTuple);
extern OffsetNumber SpGistPageAddNewItem(SpGistState *state, Page page,
@@ -484,7 +526,7 @@ extern void spgPageIndexMultiDelete(SpGistState *state, Page page,
int firststate, int reststate,
BlockNumber blkno, OffsetNumber offnum);
extern bool spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull);
+ ItemPointer heapPtr, Datum *datum, bool *isnull);
/* spgproc.c */
extern double *spg_key_orderbys_distances(Datum key, bool isLeaf,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index d92a6d12c6..93e6a43b6d 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -169,9 +169,9 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
hash | bogus |
spgist | can_order | f
spgist | can_unique | f
- spgist | can_multi_col | f
+ spgist | can_multi_col | t
spgist | can_exclude | t
- spgist | can_include | f
+ spgist | can_include | t
spgist | bogus |
(36 rows)
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..4fd2b7e878 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -356,7 +356,6 @@ CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "brin" does not support included columns
CREATE INDEX on tbl USING gist(c3) INCLUDE (c1, c4);
CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
-ERROR: access method "spgist" does not support included columns
CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "gin" does not support included columns
CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
diff --git a/src/test/regress/expected/index_including_spgist.out b/src/test/regress/expected/index_including_spgist.out
new file mode 100644
index 0000000000..fa64766fb7
--- /dev/null
+++ b/src/test/regress/expected/index_including_spgist.out
@@ -0,0 +1,139 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+DROP TABLE tbl_spgist;
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+----------
+(0 rows)
+
+DROP TABLE tbl_spgist;
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+ Table "public.tbl_spgist"
+ Column | Type | Collation | Nullable | Default
+--------+---------+-----------+----------+---------
+ c1 | bigint | | |
+ c2 | integer | | |
+ c3 | bigint | | |
+ c4 | box | | |
+Indexes:
+ "tbl_spgist_idx" spgist (c4) INCLUDE (c1, c3)
+
+DROP TABLE tbl_spgist;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..985458a1a8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -50,7 +50,7 @@ test: copy copyselect copydml insert insert_conflict
# ----------
test: create_misc create_operator create_procedure
# These depend on create_misc and create_operator
-test: create_index create_index_spgist create_view index_including index_including_gist
+test: create_index create_index_spgist create_view index_including index_including_gist index_including_spgist
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..f3df961535 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -68,6 +68,7 @@ test: create_index_spgist
test: create_view
test: index_including
test: index_including_gist
+test: index_including_spgist
test: create_aggregate
test: create_function_3
test: create_cast
diff --git a/src/test/regress/sql/index_including_spgist.sql b/src/test/regress/sql/index_including_spgist.sql
new file mode 100644
index 0000000000..a59e73aa22
--- /dev/null
+++ b/src/test/regress/sql/index_including_spgist.sql
@@ -0,0 +1,81 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+DROP TABLE tbl_spgist;
+
Same code formatted as a patch.
пн, 10 авг. 2020 г. в 17:45, Pavel Borisov <pashkin.elfe@gmail.com>:
Show quoted text
Also little bit corrected code formatting.
Best regards,
Pavel BorisovPostgres Professional: http://postgrespro.com
<http://www.postgrespro.com>
Attachments:
v3-0001-Covering-SpGist.patchapplication/octet-stream; name=v3-0001-Covering-SpGist.patchDownload
From 6a44de4f93259d2325242b0080591b2799cf7a8c Mon Sep 17 00:00:00 2001
From: Pavel Borisov <pashkin.elfe@gmail.com>
Date: Mon, 10 Aug 2020 20:09:48 +0400
Subject: [PATCH v3] Covering SpGist
---
src/backend/access/spgist/spgdoinsert.c | 172 +++++---
src/backend/access/spgist/spginsert.c | 5 +-
src/backend/access/spgist/spgscan.c | 87 +++-
src/backend/access/spgist/spgutils.c | 382 ++++++++++++++++--
src/backend/access/spgist/spgvacuum.c | 25 +-
src/backend/access/spgist/spgxlog.c | 6 +-
src/include/access/spgist_private.h | 160 +++++---
src/test/regress/expected/amutils.out | 4 +-
src/test/regress/expected/index_including.out | 1 -
.../expected/index_including_spgist.out | 139 +++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
.../regress/sql/index_including_spgist.sql | 81 ++++
13 files changed, 885 insertions(+), 180 deletions(-)
create mode 100644 src/test/regress/expected/index_including_spgist.out
create mode 100644 src/test/regress/sql/index_including_spgist.sql
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 934d65b89f..4c133b7106 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -22,7 +22,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
-
+#include "access/htup_details.h"
/*
* SPPageDesc tracks all info about a page we are inserting into. In some
@@ -220,7 +220,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
SpGistBlockIsRoot(current->blkno))
{
/* Tuple is not part of a chain */
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
current->offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -253,7 +253,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
PageGetItemId(current->page, current->offnum));
if (head->tupstate == SPGIST_LIVE)
{
- leafTuple->nextOffset = head->nextOffset;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, SGLT_GET_OFFSET(head->nextOffset));
offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -264,14 +264,14 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
*/
head = (SpGistLeafTuple) PageGetItem(current->page,
PageGetItemId(current->page, current->offnum));
- head->nextOffset = offnum;
+ SGLT_SET_OFFSET(head->nextOffset, offnum);
xlrec.offnumLeaf = offnum;
xlrec.offnumHeadLeaf = current->offnum;
}
else if (head->tupstate == SPGIST_DEAD)
{
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
PageIndexTupleDelete(current->page, current->offnum);
if (PageAddItem(current->page,
(Item) leafTuple, leafTuple->size,
@@ -362,13 +362,13 @@ checkSplitConditions(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* Don't count it in result, because it won't go to other page */
}
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
*nToSplit = n;
@@ -437,7 +437,7 @@ moveLeafs(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* We don't want to move it, so don't count it in size */
toDelete[nDelete] = i;
nDelete++;
@@ -446,7 +446,7 @@ moveLeafs(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
/* Find a leaf page that will hold them */
@@ -475,7 +475,7 @@ moveLeafs(Relation index, SpGistState *state,
* don't care). We're modifying the tuple on the source page
* here, but it's okay since we're about to delete it.
*/
- it->nextOffset = r;
+ SGLT_SET_OFFSET(it->nextOffset, r);
r = SpGistPageAddNewItem(state, npage, (Item) it, it->size,
&startOffset, false);
@@ -490,7 +490,7 @@ moveLeafs(Relation index, SpGistState *state,
}
/* add the new tuple as well */
- newLeafTuple->nextOffset = r;
+ SGLT_SET_OFFSET(newLeafTuple->nextOffset, r);
r = SpGistPageAddNewItem(state, npage,
(Item) newLeafTuple, newLeafTuple->size,
&startOffset, false);
@@ -709,6 +709,9 @@ doPickSplit(Relation index, SpGistState *state,
int nToDelete,
nToInsert,
maxToInclude;
+ Datum *leafChainDatums;
+ bool *leafChainIsnulls;
+ const int natts = IndexRelationGetNumberOfAttributes(index);
in.level = level;
@@ -723,14 +726,16 @@ doPickSplit(Relation index, SpGistState *state,
toInsert = (OffsetNumber *) palloc(sizeof(OffsetNumber) * n);
newLeafs = (SpGistLeafTuple *) palloc(sizeof(SpGistLeafTuple) * n);
leafPageSelect = (uint8 *) palloc(sizeof(uint8) * n);
-
STORE_STATE(state, xlrec.stateSrc);
+ leafChainDatums = (Datum *) palloc(n * natts * sizeof(Datum));
+ leafChainIsnulls = (bool *) palloc(n * natts * sizeof(bool));
+
/*
- * Form list of leaf tuples which will be distributed as split result;
- * also, count up the amount of space that will be freed from current.
- * (Note that in the non-root case, we won't actually delete the old
- * tuples, only replace them with redirects or placeholders.)
+ * Collect leaf tuples which will be distributed as split result; also,
+ * count up the amount of space that will be freed from current. (Note
+ * that in the non-root case, we won't actually delete the old tuples,
+ * only replace them with redirects or placeholders.)
*
* Note: the SGLTDATUM calls here are safe even when dealing with a nulls
* page. For a pass-by-value data type we will fetch a word that must
@@ -738,7 +743,15 @@ doPickSplit(Relation index, SpGistState *state,
* tuples must have size at least SGDTSIZE). For a pass-by-reference type
* we are just computing a pointer that isn't going to get dereferenced.
* So it's not worth guarding the calls with isNulls checks.
+ *
+ * Datums and isnulls of all leaf tuple attributes in a chain are
+ * collected into 2-d arrays: (number of tuples in chain) x (number of
+ * attributes) First attribute is key, the other - included attributes (if
+ * any). After picksplit we need to form new leaf tuples as key attribute
+ * length can change which can affect alignment of every include
+ * attribute.
*/
+
nToInsert = 0;
nToDelete = 0;
spaceToDelete = 0;
@@ -759,6 +772,8 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+ SpGistDeformLeafTuple(it, state, leafChainDatums + nToInsert * natts,
+ leafChainIsnulls + nToInsert * natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -784,6 +799,9 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+
+ SpGistDeformLeafTuple(it, state, leafChainDatums + nToInsert * natts,
+ leafChainIsnulls + nToInsert * natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -795,7 +813,7 @@ doPickSplit(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
toDelete[nToDelete] = i;
nToDelete++;
/* replacing it with redirect will save no space */
@@ -803,7 +821,7 @@ doPickSplit(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
}
in.nTuples = nToInsert;
@@ -816,10 +834,17 @@ doPickSplit(Relation index, SpGistState *state,
*/
in.datums[in.nTuples] = SGLTDATUM(newLeafTuple, state);
heapPtrs[in.nTuples] = newLeafTuple->heapPtr;
+
+ SpGistDeformLeafTuple(newLeafTuple, state, leafChainDatums + (in.nTuples) * natts,
+ leafChainIsnulls + (in.nTuples) * natts, isNulls);
in.nTuples++;
memset(&out, 0, sizeof(out));
+ /*
+ * Process collected key values of tuples from the chain. Included values
+ * are used to build fresh leaf tuples unchanged.
+ */
if (!isNulls)
{
/*
@@ -837,9 +862,11 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- out.leafTupleDatums[i],
- false);
+ *(leafChainDatums + i * natts) = (Datum) out.leafTupleDatums[i];
+ *(leafChainIsnulls + i * natts) = false;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums + i * natts,
+ leafChainIsnulls + i * natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -860,9 +887,14 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- (Datum) 0,
- true);
+ /*
+ * Nulls tree can contain only null key values.
+ */
+ *(leafChainDatums + i * natts) = (Datum) 0;
+ *(leafChainIsnulls + i * natts) = true;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums + i * natts,
+ leafChainIsnulls + i * natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -1196,10 +1228,10 @@ doPickSplit(Relation index, SpGistState *state,
if (ItemPointerIsValid(&nodes[n]->t_tid))
{
Assert(ItemPointerGetBlockNumber(&nodes[n]->t_tid) == leafBlock);
- it->nextOffset = ItemPointerGetOffsetNumber(&nodes[n]->t_tid);
+ SGLT_SET_OFFSET(it->nextOffset, ItemPointerGetOffsetNumber(&nodes[n]->t_tid));
}
else
- it->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(it->nextOffset, InvalidOffsetNumber);
/* Insert it on page */
newoffset = SpGistPageAddNewItem(state, BufferGetPage(leafBuffer),
@@ -1889,67 +1921,83 @@ spgSplitNodeAction(Relation index, SpGistState *state,
*/
bool
spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull)
+ ItemPointer heapPtr, Datum *datum, bool *isnull)
{
int level = 0;
- Datum leafDatum;
+ Datum *leafDatum;
int leafSize;
SPPageDesc current,
parent;
FmgrInfo *procinfo = NULL;
+ int i;
/*
* Look up FmgrInfo of the user-defined choose function once, to save
* cycles in the loop below.
*/
- if (!isnull)
+ if (!isnull[0])
procinfo = index_getprocinfo(index, 1, SPGIST_CHOOSE_PROC);
/*
* Prepare the leaf datum to insert.
- *
+ */
+
+ leafDatum = (Datum *) palloc0(sizeof(Datum) * (IndexRelationGetNumberOfAttributes(index)));
+
+ /*
* If an optional "compress" method is provided, then call it to form the
- * leaf datum from the input datum. Otherwise store the input datum as
- * is. Since we don't use index_form_tuple in this AM, we have to make
- * sure value to be inserted is not toasted; FormIndexDatum doesn't
- * guarantee that. But we assume the "compress" method to return an
- * untoasted value.
+ * key datum from the input datum. Otherwise store the input datum as is.
+ * Since we don't use index_form_tuple in this AM, we have to make sure
+ * value to be inserted is not toasted; FormIndexDatum doesn't guarantee
+ * that. But we assume the "compress" method to return an untoasted
+ * value.
*/
- if (!isnull)
+ if (!isnull[0])
{
if (OidIsValid(index_getprocid(index, 1, SPGIST_COMPRESS_PROC)))
{
FmgrInfo *compressProcinfo = NULL;
compressProcinfo = index_getprocinfo(index, 1, SPGIST_COMPRESS_PROC);
- leafDatum = FunctionCall1Coll(compressProcinfo,
- index->rd_indcollation[0],
- datum);
+ leafDatum[0] = FunctionCall1Coll(compressProcinfo,
+ index->rd_indcollation[0],
+ datum[0]);
}
else
{
Assert(state->attLeafType.type == state->attType.type);
if (state->attType.attlen == -1)
- leafDatum = PointerGetDatum(PG_DETOAST_DATUM(datum));
+ leafDatum[0] = PointerGetDatum(PG_DETOAST_DATUM(datum[0]));
else
- leafDatum = datum;
+ leafDatum[0] = datum[0];
}
}
else
- leafDatum = (Datum) 0;
+ leafDatum[0] = (Datum) 0;
+
+ for (i = 1; i < IndexRelationGetNumberOfAttributes(index); i++)
+ {
+ if (!isnull[i])
+ {
+ if (TupleDescAttr(state->includeTupdesc, i - 1)->attlen == -1)
+ leafDatum[i] = PointerGetDatum(PG_DETOAST_DATUM(datum[i]));
+ else
+ leafDatum[i] = datum[i];
+ }
+ else
+ leafDatum[i] = (Datum) 0;
+ }
+
/*
- * Compute space needed for a leaf tuple containing the given datum.
+ * Compute space needed on a page for a leaf tuple containing the given
+ * datum.
*
* If it isn't gonna fit, and the opclass can't reduce the datum size by
* suffixing, bail out now rather than getting into an endless loop.
*/
- if (!isnull)
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
- else
- leafSize = SGDTSIZE + sizeof(ItemIdData);
+ leafSize = SpgLeafSize(state, leafDatum, isnull) + sizeof(ItemIdData);
if (leafSize > SPGIST_PAGE_CAPACITY && !state->config.longValuesOK)
ereport(ERROR,
@@ -1961,7 +2009,7 @@ spgdoinsert(Relation index, SpGistState *state,
errhint("Values larger than a buffer page cannot be indexed.")));
/* Initialize "current" to the appropriate root page */
- current.blkno = isnull ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
+ current.blkno = isnull[0] ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
current.buffer = InvalidBuffer;
current.page = NULL;
current.offnum = FirstOffsetNumber;
@@ -1995,7 +2043,7 @@ spgdoinsert(Relation index, SpGistState *state,
*/
current.buffer =
SpGistGetBuffer(index,
- GBUF_LEAF | (isnull ? GBUF_NULLS : 0),
+ GBUF_LEAF | (isnull[0] ? GBUF_NULLS : 0),
Min(leafSize, SPGIST_PAGE_CAPACITY),
&isNew);
current.blkno = BufferGetBlockNumber(current.buffer);
@@ -2037,7 +2085,7 @@ spgdoinsert(Relation index, SpGistState *state,
current.page = BufferGetPage(current.buffer);
/* should not arrive at a page of the wrong type */
- if (isnull ? !SpGistPageStoresNulls(current.page) :
+ if (isnull[0] ? !SpGistPageStoresNulls(current.page) :
SpGistPageStoresNulls(current.page))
elog(ERROR, "SPGiST index page %u has wrong nulls flag",
current.blkno);
@@ -2054,7 +2102,7 @@ spgdoinsert(Relation index, SpGistState *state,
{
/* it fits on page, so insert it and we're done */
addLeafTuple(index, state, leafTuple,
- ¤t, &parent, isnull, isNew);
+ ¤t, &parent, isnull[0], isNew);
break;
}
else if ((sizeToSplit =
@@ -2068,14 +2116,14 @@ spgdoinsert(Relation index, SpGistState *state,
* chain to another leaf page rather than splitting it.
*/
Assert(!isNew);
- moveLeafs(index, state, ¤t, &parent, leafTuple, isnull);
+ moveLeafs(index, state, ¤t, &parent, leafTuple, isnull[0]);
break; /* we're done */
}
else
{
/* picksplit */
if (doPickSplit(index, state, ¤t, &parent,
- leafTuple, level, isnull, isNew))
+ leafTuple, level, isnull[0], isNew))
break; /* doPickSplit installed new tuples */
/* leaf tuple will not be inserted yet */
@@ -2110,8 +2158,8 @@ spgdoinsert(Relation index, SpGistState *state,
innerTuple = (SpGistInnerTuple) PageGetItem(current.page,
PageGetItemId(current.page, current.offnum));
- in.datum = datum;
- in.leafDatum = leafDatum;
+ in.datum = datum[0];
+ in.leafDatum = leafDatum[0];
in.level = level;
in.allTheSame = innerTuple->allTheSame;
in.hasPrefix = (innerTuple->prefixSize > 0);
@@ -2121,7 +2169,7 @@ spgdoinsert(Relation index, SpGistState *state,
memset(&out, 0, sizeof(out));
- if (!isnull)
+ if (!isnull[0])
{
/* use user-defined choose method */
FunctionCall2Coll(procinfo,
@@ -2158,11 +2206,11 @@ spgdoinsert(Relation index, SpGistState *state,
/* Adjust level as per opclass request */
level += out.result.matchNode.levelAdd;
/* Replace leafDatum and recompute leafSize */
- if (!isnull)
+ if (!isnull[0])
{
- leafDatum = out.result.matchNode.restDatum;
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
+ leafDatum[0] = out.result.matchNode.restDatum;
+ leafSize = SpgLeafSize(state, leafDatum, isnull) +
+ sizeof(ItemIdData);
}
/*
@@ -2227,6 +2275,6 @@ spgdoinsert(Relation index, SpGistState *state,
SpGistSetLastUsedPage(index, parent.buffer);
UnlockReleaseBuffer(parent.buffer);
}
-
+ pfree(leafDatum);
return true;
}
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4508a2b92..b54ae85f6e 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -55,8 +55,7 @@ spgistBuildCallback(Relation index, ItemPointer tid, Datum *values,
* lock on some buffer. So we need to be willing to retry. We can flush
* any temp data when retrying.
*/
- while (!spgdoinsert(index, &buildstate->spgstate, tid,
- *values, *isnull))
+ while (!spgdoinsert(index, &buildstate->spgstate, tid, values, isnull))
{
MemoryContextReset(buildstate->tmpCtx);
}
@@ -226,7 +225,7 @@ spginsert(Relation index, Datum *values, bool *isnull,
* to avoid cumulative memory consumption. That means we also have to
* redo initSpGistState(), but it's cheap enough not to matter.
*/
- while (!spgdoinsert(index, &spgstate, ht_ctid, *values, *isnull))
+ while (!spgdoinsert(index, &spgstate, ht_ctid, values, isnull))
{
MemoryContextReset(insertCtx);
initSpGistState(&spgstate, index);
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 4d506bfb9a..5a3c7c50cf 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -28,7 +28,8 @@
typedef void (*storeRes_func) (SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isNull, bool recheck,
- bool recheckDistances, double *distances);
+ bool recheckDistances, double *distances,
+ SpGistLeafTuple leafTuple);
/*
* Pairing heap comparison function for the SpGistSearchItem queue.
@@ -88,6 +89,9 @@ spgFreeSearchItem(SpGistScanOpaque so, SpGistSearchItem *item)
if (item->traversalValue)
pfree(item->traversalValue);
+ if (item->isLeaf && item->leafTuple)
+ pfree(item->leafTuple);
+
pfree(item);
}
@@ -134,6 +138,8 @@ spgAddStartItem(SpGistScanOpaque so, bool isnull)
startEntry->recheck = false;
startEntry->recheckDistances = false;
+ startEntry->leafTuple = NULL;
+
spgAddSearchItemToQueue(so, startEntry);
}
@@ -438,14 +444,30 @@ spgendscan(IndexScanDesc scan)
* Leaf SpGistSearchItem constructor, called in queue context
*/
static SpGistSearchItem *
-spgNewHeapItem(SpGistScanOpaque so, int level, ItemPointer heapPtr,
+spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
Datum leafValue, bool recheck, bool recheckDistances,
bool isnull, double *distances)
{
SpGistSearchItem *item = spgAllocSearchItem(so, isnull, distances);
+ /*
+ * If there are include attributes search item in the queue should contain
+ * them.
+ */
+ if (so->state.includeTupdesc)
+ {
+ Assert(so->state.includeTupdesc->natts);
+
+ item->leafTuple = palloc(leafTuple->size);
+ memcpy(item->leafTuple, leafTuple, leafTuple->size);
+ }
+ else
+ {
+ item->leafTuple = NULL;
+ }
+
item->level = level;
- item->heapPtr = *heapPtr;
+ item->heapPtr = leafTuple->heapPtr;
/* copy value to queue cxt out of tmp cxt */
item->value = isnull ? (Datum) 0 :
datumCopy(leafValue, so->state.attLeafType.attbyval,
@@ -503,6 +525,8 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
in.returnData = so->want_itup;
in.leafDatum = SGLTDATUM(leafTuple, &so->state);
+
+
out.leafValue = (Datum) 0;
out.recheck = false;
out.distances = NULL;
@@ -528,7 +552,7 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
/* the scan is ordered -> add the item to the queue */
MemoryContext oldCxt = MemoryContextSwitchTo(so->traversalCxt);
SpGistSearchItem *heapItem = spgNewHeapItem(so, item->level,
- &leafTuple->heapPtr,
+ leafTuple,
leafValue,
recheck,
recheckDistances,
@@ -543,8 +567,10 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
{
/* non-ordered scan, so report the item right away */
Assert(!recheckDistances);
+
storeRes(so, &leafTuple->heapPtr, leafValue, isnull,
- recheck, false, NULL);
+ recheck, false, NULL, leafTuple);
+
*reportedSome = true;
}
}
@@ -736,7 +762,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
/* dead tuple should be first in chain */
Assert(offset == ItemPointerGetOffsetNumber(&item->heapPtr));
/* No live entries on this page */
- Assert(leafTuple->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(leafTuple->nextOffset) == InvalidOffsetNumber);
return SpGistBreakOffsetNumber;
}
}
@@ -750,7 +776,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes);
- return leafTuple->nextOffset;
+ return SGLT_GET_OFFSET(leafTuple->nextOffset);
}
/*
@@ -782,8 +808,8 @@ redirect:
{
/* We store heap items in the queue only in case of ordered search */
Assert(so->numberOfNonNullOrderBys > 0);
- storeRes(so, &item->heapPtr, item->value, item->isNull,
- item->recheck, item->recheckDistances, item->distances);
+ storeRes(so, &item->heapPtr, item->value, item->isNull, item->recheck,
+ item->recheckDistances, item->distances, item->leafTuple);
reportedSome = true;
}
else
@@ -877,7 +903,7 @@ redirect:
static void
storeBitmap(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *distances)
+ double *distances, SpGistLeafTuple leafTuple)
{
Assert(!recheckDistances && !distances);
tbm_add_tuples(so->tbm, heapPtr, 1, recheck);
@@ -904,7 +930,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
static void
storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *nonNullDistances)
+ double *nonNullDistances, SpGistLeafTuple leafTuple)
{
Assert(so->nPtrs < MaxIndexTuplesPerPage);
so->heapPtrs[so->nPtrs] = *heapPtr;
@@ -949,9 +975,38 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
* Reconstruct index data. We have to copy the datum out of the temp
* context anyway, so we may as well create the tuple here.
*/
- so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
- &leafValue,
- &isnull);
+ if (so->state.includeTupdesc)
+ {
+ /* Add included attributes */
+ Datum *leafDatums;
+ bool *leafIsnulls;
+
+ Assert(so->state.includeTupdesc->natts);
+
+ leafDatums = (Datum *) palloc(sizeof(Datum) * (so->state.includeTupdesc->natts + 1));
+ leafIsnulls = (bool *) palloc(sizeof(bool) * (so->state.includeTupdesc->natts + 1));
+
+ SpGistDeformLeafTuple(leafTuple, &so->state, leafDatums, leafIsnulls, isnull);
+
+ /*
+ * override key value extracted from LeafTuple in case we've
+ * reconstructed it already
+ */
+ leafDatums[0] = leafValue;
+ leafIsnulls[0] = isnull;
+
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ leafDatums,
+ leafIsnulls);
+ pfree(leafDatums);
+ pfree(leafIsnulls);
+ }
+ else
+ {
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ &leafValue,
+ &isnull);
+ }
}
so->nPtrs++;
}
@@ -1019,6 +1074,10 @@ spgcanreturn(Relation index, int attno)
{
SpGistCache *cache;
+ /* Included attributes always can be fetched for index-only scans */
+ if (attno > 1)
+ return true;
+
/* We can do it if the opclass config function says so */
cache = spgGetCache(index);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 0efe05e552..3ca47ff53d 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -31,7 +31,18 @@
#include "utils/index_selfuncs.h"
#include "utils/lsyscache.h"
#include "utils/syscache.h"
+#include "access/itup.h"
+#include "access/detoast.h"
+#include "access/toast_internals.h"
+#include "access/heaptoast.h"
+#include "utils/expandeddatum.h"
+/* Does att's datatype allow packing into the 1-byte-header varlena format? */
+#define ATT_IS_PACKABLE(att) \
+ ((att)->attlen == -1 && (att)->attstorage != TYPSTORAGE_PLAIN)
+
+Size spgIncludedDataSize(TupleDesc tupleDesc, Datum *values,
+ bool *isnull, Size start);
/*
* SP-GiST handler function: return IndexAmRoutine with access method parameters
@@ -49,7 +60,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amcanorderbyop = true;
amroutine->amcanbackward = false;
amroutine->amcanunique = false;
- amroutine->amcanmulticol = false;
+ amroutine->amcanmulticol = true;
amroutine->amoptionalkey = true;
amroutine->amsearcharray = false;
amroutine->amsearchnulls = true;
@@ -57,7 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amclusterable = false;
amroutine->ampredlocks = false;
amroutine->amcanparallel = false;
- amroutine->amcaninclude = false;
+ amroutine->amcaninclude = true;
amroutine->amusemaintenanceworkmem = false;
amroutine->amparallelvacuumoptions =
VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
@@ -116,14 +127,21 @@ spgGetCache(Relation index)
cache = MemoryContextAllocZero(index->rd_indexcxt,
sizeof(SpGistCache));
- /* SPGiST doesn't support multi-column indexes */
- Assert(index->rd_att->natts == 1);
+ /*
+ * SPGiST should have one key column and can also have included
+ * columns
+ */
+ if (IndexRelationGetNumberOfKeyAttributes(index) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("SPGiST index can have only one key column")));
/*
- * Get the actual data type of the indexed column from the index
- * tupdesc. We pass this to the opclass config function so that
- * polymorphic opclasses are possible.
+ * Get the actual data type of the key column from the index tupdesc.
+ * We pass this to the opclass config function so that polymorphic
+ * opclasses are possible.
*/
+
atttype = TupleDescAttr(index->rd_att, 0)->atttypid;
/* Call the config function to get config info for the opclass */
@@ -156,6 +174,7 @@ spgGetCache(Relation index)
fillTypeDesc(&cache->attPrefixType, cache->config.prefixType);
fillTypeDesc(&cache->attLabelType, cache->config.labelType);
+
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
@@ -177,7 +196,23 @@ spgGetCache(Relation index)
/* assume it's up to date */
cache = (SpGistCache *) index->rd_amcache;
}
+ /* Form descriptor for included columns if any */
+ if (IndexRelationGetNumberOfAttributes(index) > 1)
+ {
+ int i;
+
+ cache->includeTupdesc = CreateTemplateTupleDesc(
+ IndexRelationGetNumberOfAttributes(index) - 1);
+ for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; i++)
+ {
+ TupleDescInitEntry(cache->includeTupdesc, i + 1, NULL,
+ TupleDescAttr(index->rd_att, i + 1)->atttypid,
+ -1, 0);
+ }
+ }
+ else
+ cache->includeTupdesc = NULL;
return cache;
}
@@ -190,6 +225,7 @@ initSpGistState(SpGistState *state, Relation index)
/* Get cached static information about index */
cache = spgGetCache(index);
+ state->includeTupdesc = cache->includeTupdesc;
state->config = cache->config;
state->attType = cache->attType;
state->attLeafType = cache->attLeafType;
@@ -603,7 +639,7 @@ spgoptions(Datum reloptions, bool validate)
/*
* Get the space needed to store a non-null datum of the indicated type.
- * Note the result is already rounded up to a MAXALIGN boundary.
+ * Note the result is not maxaligned and this should be done by caller if needed.
* Also, we follow the SPGiST convention that pass-by-val types are
* just stored in their Datum representation (compare memcpyDatum).
*/
@@ -619,7 +655,7 @@ SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum)
else
size = VARSIZE_ANY(datum);
- return MAXALIGN(size);
+ return size;
}
/*
@@ -642,36 +678,202 @@ memcpyDatum(void *target, SpGistTypeDesc *att, Datum datum)
}
/*
- * Construct a leaf tuple containing the given heap TID and datum value
+ * Private version of heap_compute_data_size with start address not
+ * necessarily MAXALIGNed. The reason is that start address (and alignment)
+ * influence alignment of each of next values and overall size of included
+ * data area in SpGiST leaf tuple.
+ */
+Size
+spgIncludedDataSize(TupleDesc tupleDesc,
+ Datum *values,
+ bool *isnull, Size start)
+{
+ Size data_length = 0;
+ int i;
+ int numberOfAttributes = tupleDesc->natts;
+
+ data_length = start;
+ for (i = 0; i < numberOfAttributes; i++)
+ {
+ Datum val;
+ Form_pg_attribute atti;
+
+ if (isnull[i])
+ continue;
+
+ val = values[i];
+ atti = TupleDescAttr(tupleDesc, i);
+
+ if (ATT_IS_PACKABLE(atti) &&
+ VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
+ {
+ /*
+ * we're anticipating converting to a short varlena header, so
+ * adjust length and don't count any alignment
+ */
+ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
+ }
+ else if (atti->attlen == -1 &&
+ VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+ {
+ /*
+ * we want to flatten the expanded value so that the constructed
+ * tuple doesn't depend on it
+ */
+ data_length = att_align_nominal(data_length, atti->attalign);
+ data_length += EOH_get_flat_size(DatumGetEOHP(val));
+ }
+ else
+ {
+ data_length = att_align_datum(data_length, atti->attalign,
+ atti->attlen, val);
+ data_length = att_addlength_datum(data_length, atti->attlen,
+ val);
+ }
+ }
+ return data_length - start;
+}
+
+/* Calculate overall leaf tuple size. SGLTHDRSZ is MAXALIGNed only for backward
+ * compatibility and there might be gap between header and key data. After key
+ * data there are no such gaps more than is is necessary for each value
+ * alignment. Overall result is MAXALIGNed.*/
+unsigned int
+SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull)
+{
+ /* compute space needed, nullmask size and offset for include attributes */
+ unsigned int size = SGLTHDRSZ;
+ unsigned int i;
+
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+ /* nullmask size */
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ size += (state->includeTupdesc->natts / 8) + 1;
+ break;
+ }
+ }
+ /* overall included attributes size each with added proper alignment. */
+ size += spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ }
+ return MAXALIGN(size);
+}
+
+/*
+ * Construct a leaf tuple containing the given heap TID, key data and included
+ * columns data. Key data starts from MAXALIGN boundary for backward compatibility.
+ * Nullmask apply only to included attributes and is placed just after key data if
+ * there is at least one NULL among included attributes. It doesn't need alignment.
+ * Then all included columns data follow aligned by their typealign's.
*/
SpGistLeafTuple
spgFormLeafTuple(SpGistState *state, ItemPointer heapPtr,
- Datum datum, bool isnull)
+ Datum *datum, bool *isnull)
{
SpGistLeafTuple tup;
- unsigned int size;
+ unsigned int size = SGLTHDRSZ;
+ unsigned int include_offset = 0;
+ unsigned int nullmask_size = 0;
+ unsigned int data_offset = 0;
+ unsigned int data_size = 0;
+ uint16 tupmask = 0;
+ int i;
- /* compute space needed (note result is already maxaligned) */
- size = SGLTHDRSZ;
- if (!isnull)
- size += SpGistGetTypeSize(&state->attLeafType, datum);
+ /*
+ * Calculate space needed. If there are include attributes also calculate
+ * sizes and offsets needed for heap_fill_tuple
+ */
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = size;
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ nullmask_size = (state->includeTupdesc->natts / 8) + 1;
+ size += nullmask_size;
+ break;
+ }
+ }
+
+ /*
+ * Alignment of all included attributes is counted inside data_size.
+ * data_offset itself is not aligned.
+ */
+ data_size = spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ data_offset = size;
+
+ size += data_size;
+ }
/*
* Ensure that we can replace the tuple with a dead tuple later. This
- * test is unnecessary when !isnull, but let's be safe.
+ * test is unnecessary when !isnull[0], but let's be safe.
*/
if (size < SGDTSIZE)
size = SGDTSIZE;
/* OK, form the tuple */
- tup = (SpGistLeafTuple) palloc0(size);
+ tup = (SpGistLeafTuple) palloc0(MAXALIGN(size));
- tup->size = size;
- tup->nextOffset = InvalidOffsetNumber;
+ tup->size = MAXALIGN(size);
+ SGLT_SET_OFFSET(tup->nextOffset, InvalidOffsetNumber);
tup->heapPtr = *heapPtr;
- if (!isnull)
- memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum);
+ if (!isnull[0])
+ memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum[0]);
+
+ /* Add included columns data to leaf tuple if any. */
+ if (state->includeTupdesc)
+ {
+ /*
+ * The start of include attributes tuple is not aligned by default.
+ * All values alignment should be done by heap_fill_tuple
+ * automaticaly. If there is a nulls mask it is included just after
+ * key attribute data and it should not be aligned.
+ */
+ heap_fill_tuple(state->includeTupdesc, datum + 1, isnull + 1,
+ (char *) tup + data_offset,
+ data_size, &tupmask,
+ (nullmask_size ? (bits8 *) tup + include_offset : NULL));
+
+ if (nullmask_size)
+ SGLT_SET_CONTAINSNULLMASK(tup->nextOffset, 1);
+
+ /*
+ * We do this because heap_fill_tuple wants to initialize a "tupmask"
+ * which is used for HeapTuples, but the only relevant info is the
+ * "has variable attributes" field. We have already set the hasnull
+ * bit above.
+ */
+ if (tupmask & HEAP_HASVARWIDTH)
+ SGLT_SET_CONTAINSVARATT(tup->nextOffset, 1);
+ }
return tup;
}
@@ -688,10 +890,10 @@ spgFormNodeTuple(SpGistState *state, Datum label, bool isnull)
unsigned int size;
unsigned short infomask = 0;
- /* compute space needed (note result is already maxaligned) */
+ /* compute space needed */
size = SGNTHDRSZ;
if (!isnull)
- size += SpGistGetTypeSize(&state->attLabelType, label);
+ size += MAXALIGN(SpGistGetTypeSize(&state->attLabelType, label));
/*
* Here we make sure that the size will fit in the field reserved for it
@@ -735,7 +937,7 @@ spgFormInnerTuple(SpGistState *state, bool hasPrefix, Datum prefix,
/* Compute size needed */
if (hasPrefix)
- prefixSize = SpGistGetTypeSize(&state->attPrefixType, prefix);
+ prefixSize = MAXALIGN(SpGistGetTypeSize(&state->attPrefixType, prefix));
else
prefixSize = 0;
@@ -1046,3 +1248,133 @@ spgproperty(Oid index_oid, int attno,
return true;
}
+
+/*
+ * Convert an SpGist tuple into palloc'd Datum/isnull arrays.
+ *
+ */
+void
+SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state, Datum *datum, bool *isnull,
+ bool key_isnull)
+{
+ unsigned int include_offset; /* offset of include data */
+ int off;
+ bits8 *nullmask_ptr = NULL; /* ptr to null bitmap in tuple */
+ char *tp;
+ bool slow = false; /* can we use/set attcacheoff? */
+ int i;
+
+ if (key_isnull)
+ {
+ datum[0] = (Datum) 0;
+ isnull[0] = true;
+ }
+ else
+ {
+ datum[0] = SGLTDATUM(tup, state);
+ isnull[0] = false;
+ }
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = key_isnull ? SGLTHDRSZ : SGLTHDRSZ + SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ tp = (char *) tup;
+ off = include_offset;
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ nullmask_ptr = (bits8 *) tp + include_offset;
+ off += (state->includeTupdesc->natts) / 8 + 1;
+ }
+
+ if (state->attLeafType.attlen > 0 && !SGLT_GET_CONTAINSVARATT(tup->nextOffset) &&
+ !SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ /* can use attcacheoff for all attributes */
+ {
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ isnull[i] = false;
+ if (thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else
+ {
+ off = att_align_nominal(off, thisatt->attalign);
+ thisatt->attcacheoff = off;
+ }
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+ }
+ }
+ else
+
+ /*
+ * general case: can use cache until first null or varlen
+ * attribute
+ */
+ {
+ if (state->attLeafType.attlen <= 0)
+ slow = true; /* can't use attcacheoff at all */
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ if (att_isnull(i - 1, nullmask_ptr))
+ {
+ datum[i] = (Datum) 0;
+ isnull[i] = true;
+ slow = true; /* can't use attcacheoff anymore */
+ continue;
+ }
+ }
+
+ isnull[i] = false;
+
+ if (!slow && thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else if (thisatt->attlen == -1)
+ {
+ /*
+ * We can only cache the offset for a varlena attribute if
+ * the offset is already suitably aligned, so that there
+ * would be no pad bytes in any case: then the offset will
+ * be valid for either an aligned or unaligned value.
+ */
+ if (!slow && off == att_align_nominal(off, thisatt->attalign))
+ thisatt->attcacheoff = off;
+ else
+ {
+ off = att_align_pointer(off, thisatt->attalign, -1, tp + off);
+ slow = true;
+ }
+ }
+ else
+ {
+ /* not varlena, so safe to use att_align_nominal */
+ off = att_align_nominal(off, thisatt->attalign);
+
+ if (!slow)
+ thisatt->attcacheoff = off;
+ }
+
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+
+ if (thisatt->attlen <= 0)
+ slow = true; /* can't use attcacheoff anymore */
+ }
+ }
+ }
+}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index bd98707f3c..a0d76901fc 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -168,23 +168,28 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
/* Form predecessor map, too */
- if (lt->nextOffset != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) != InvalidOffsetNumber)
{
/* paranoia about corrupted chain links */
- if (lt->nextOffset < FirstOffsetNumber ||
- lt->nextOffset > max ||
- predecessor[lt->nextOffset] != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) < FirstOffsetNumber ||
+ SGLT_GET_OFFSET(lt->nextOffset) > max ||
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] != InvalidOffsetNumber)
elog(ERROR, "inconsistent tuple chain links in page %u of index \"%s\"",
BufferGetBlockNumber(buffer),
RelationGetRelationName(index));
- predecessor[lt->nextOffset] = i;
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] = i;
}
}
else if (lt->tupstate == SPGIST_REDIRECT)
{
SpGistDeadTuple dt = (SpGistDeadTuple) lt;
- Assert(dt->nextOffset == InvalidOffsetNumber);
+ /*
+ * Dead tuple nextOffset is allowed to have any values of two
+ * highest bits in case it is inherited from SpGistLeafTuple where
+ * these bits has their own meaning.
+ */
+ Assert(SGLT_GET_OFFSET(dt->nextOffset) == InvalidOffsetNumber);
Assert(ItemPointerIsValid(&dt->pointer));
/*
@@ -201,7 +206,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
else
{
- Assert(lt->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(lt->nextOffset) == InvalidOffsetNumber);
}
}
@@ -250,7 +255,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
prevLive = deletable[i] ? InvalidOffsetNumber : i;
/* scan down the chain ... */
- j = head->nextOffset;
+ j = SGLT_GET_OFFSET(head->nextOffset);
while (j != InvalidOffsetNumber)
{
SpGistLeafTuple lt;
@@ -301,7 +306,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
interveningDeletable = false;
}
- j = lt->nextOffset;
+ j = SGLT_GET_OFFSET(lt->nextOffset);
}
if (prevLive == InvalidOffsetNumber)
@@ -366,7 +371,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 7be2291d07..4022e3af07 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -122,8 +122,8 @@ spgRedoAddLeaf(XLogReaderState *record)
head = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, xldata->offnumHeadLeaf));
- Assert(head->nextOffset == leafTupleHdr.nextOffset);
- head->nextOffset = xldata->offnumLeaf;
+ Assert(SGLT_GET_OFFSET(head->nextOffset) == SGLT_GET_OFFSET(leafTupleHdr.nextOffset));
+ SGLT_SET_OFFSET(head->nextOffset, xldata->offnumLeaf);
}
}
else
@@ -822,7 +822,7 @@ spgRedoVacuumLeaf(XLogReaderState *record)
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
PageSetLSN(page, lsn);
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index 00b98ec6a0..8d03adb8f5 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -141,6 +141,7 @@ typedef struct SpGistState
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc; /* tuple descriptor of included columns */
char *deadTupleStorage; /* workspace for spgFormDeadTuple */
@@ -148,6 +149,98 @@ typedef struct SpGistState
bool isBuild; /* true if doing index build */
} SpGistState;
+/*
+ * SPGiST leaf tuple: carries a datum and a heap tuple TID
+ *
+ * In the simplest case, the datum is the same as the indexed value; but
+ * it could also be a suffix or some other sort of delta that permits
+ * reconstruction given knowledge of the prefix path traversed to get here.
+ *
+ * The size field is wider than could possibly be needed for an on-disk leaf
+ * tuple, but this allows us to form leaf tuples even when the datum is too
+ * wide to be stored immediately, and it costs nothing because of alignment
+ * considerations.
+ *
+ * Normally, nextOffset links to the next tuple belonging to the same parent
+ * node (which must be on the same page). But when the root page is a leaf
+ * page, we don't chain its tuples, so nextOffset is always 0 on the root.
+ *
+ * size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
+ * so that the tuple can be converted to REDIRECT status later. (This
+ * restriction only adds bytes for the null-datum case, otherwise alignment
+ * restrictions force it anyway.)
+ *
+ * In a leaf tuple for a NULL indexed value, there's no useful datum value;
+ * however, the SGDTSIZE limit ensures that's there's a Datum word there
+ * anyway, so SGLTDATUM can be applied safely as long as you don't do
+ * anything with the result.
+ *
+ * Minimum space to store SpGistLeafTuple on a page is 12 bytes tuple header
+ * and 4 bytes ItemIdData so 14 lower bits of nextOffset (accessed as
+ * SGLT_GET/SET_OFFSET) is enough to store actual tuple number on a page even
+ * if page size is 64Kb. Two higher bits are to store per-tuple
+ * information is there nulls mask exist and is there any included attribute
+ * of variable length type.
+ */
+
+typedef struct SpGistLeafTupleData
+{
+ unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
+ size:30; /* large enough for any palloc'able value */
+ OffsetNumber nextOffset; /* higher 1 bit = 1 if included values has
+ * nulls, 2 bit = 1 if included values contain
+ * variable length values, lower 15 bits - is
+ * "actual" nextOffset i.e. number of next
+ * tuple in chain on a page, or
+ * InvalidOffsetNumber. They SHOULD NOT be
+ * set/read directly,
+ * SGLT_SET_XXX/SGLT_GET_XXX macros must be
+ * used instead. */
+ ItemPointerData heapPtr; /* TID of represented heap tuple */
+ /* leaf datum follows */
+
+ /*
+ * if SGLT_GET_CONTAINSNULLMASK nullmask follows. Its size (number of
+ * included columns/8)+1
+ */
+ /* include attributes follow if any */
+} SpGistLeafTupleData;
+
+typedef SpGistLeafTupleData *SpGistLeafTuple;
+
+#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
+#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
+#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
+ *(Datum *) SGLTDATAPTR(x) : \
+ PointerGetDatum(SGLTDATAPTR(x)))
+/*
+ * Accessor macros to get and set actual 14-bit offset and two bit flags from/to
+ * nextOffset value.
+ */
+#define SGLT_GET_OFFSET(x) ( (x) & 0x3FFF )
+#define SGLT_GET_CONTAINSNULLMASK(x) ( (x) >> 15 )
+#define SGLT_GET_CONTAINSVARATT(x) ( ( (x) & 4000 ) >> 14 )
+#define SGLT_SET_OFFSET(x,o) ( (x) = ( (x) & 0xC000 ) | ( (o) & 0x3FFF) )
+#define SGLT_SET_CONTAINSNULLMASK(x,n) ( (x) = ( (n) << 15 ) | ( (x) & 0x3FFF ) )
+#define SGLT_SET_CONTAINSVARATT(x,v) ( (x) = ( (v) << 14 ) | ( (x) & 0xBFFF ) )
+
+#define SGLT_GET_INCLUDE_TUPSIZE(x) SGLT_GET_OFFSET(x)
+#define SGLT_SET_INCLUDE_TUPSIZE(x,o) SGLT_SET_OFFSET(x,o)
+
+extern char *SpGistFormIncludeTuple(TupleDesc tupleDescriptor, Datum *values,
+ bool *isnull, uint16 *tupdata);
+
+/*
+ * SPGiST dead tuple: declaration for examining non-live tuples
+ *
+ * The tupstate field of this struct must match those of regular inner and
+ * leaf tuples, and its size field must match a leaf tuple's.
+ * Also, the pointer field must be in the same place as a leaf tuple's heapPtr
+ * field, to satisfy some Asserts that we make when replacing a leaf tuple
+ * with a dead tuple.
+ * We don't use nextOffset, but it's needed to align the pointer field.
+ */
+
typedef struct SpGistSearchItem
{
pairingheap_node phNode; /* pairing heap node */
@@ -160,14 +253,14 @@ typedef struct SpGistSearchItem
bool isLeaf; /* SearchItem is heap item */
bool recheck; /* qual recheck is needed */
bool recheckDistances; /* distance recheck is needed */
-
+ SpGistLeafTuple leafTuple;
/* array with numberOfOrderBys entries */
double distances[FLEXIBLE_ARRAY_MEMBER];
+ /* if there are include columns SpGistLeafTupleData follow */
} SpGistSearchItem;
#define SizeOfSpGistSearchItem(n_distances) \
(offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
-
/*
* Private state of an index scan
*/
@@ -241,6 +334,7 @@ typedef struct SpGistCache
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc;
SpGistLUPCache lastUsedPages; /* local storage of last-used info */
} SpGistCache;
@@ -321,60 +415,6 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
*(Datum *) SGNTDATAPTR(x) : \
PointerGetDatum(SGNTDATAPTR(x)))
-/*
- * SPGiST leaf tuple: carries a datum and a heap tuple TID
- *
- * In the simplest case, the datum is the same as the indexed value; but
- * it could also be a suffix or some other sort of delta that permits
- * reconstruction given knowledge of the prefix path traversed to get here.
- *
- * The size field is wider than could possibly be needed for an on-disk leaf
- * tuple, but this allows us to form leaf tuples even when the datum is too
- * wide to be stored immediately, and it costs nothing because of alignment
- * considerations.
- *
- * Normally, nextOffset links to the next tuple belonging to the same parent
- * node (which must be on the same page). But when the root page is a leaf
- * page, we don't chain its tuples, so nextOffset is always 0 on the root.
- *
- * size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
- * so that the tuple can be converted to REDIRECT status later. (This
- * restriction only adds bytes for the null-datum case, otherwise alignment
- * restrictions force it anyway.)
- *
- * In a leaf tuple for a NULL indexed value, there's no useful datum value;
- * however, the SGDTSIZE limit ensures that's there's a Datum word there
- * anyway, so SGLTDATUM can be applied safely as long as you don't do
- * anything with the result.
- */
-typedef struct SpGistLeafTupleData
-{
- unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
- size:30; /* large enough for any palloc'able value */
- OffsetNumber nextOffset; /* next tuple in chain, or InvalidOffsetNumber */
- ItemPointerData heapPtr; /* TID of represented heap tuple */
- /* leaf datum follows */
-} SpGistLeafTupleData;
-
-typedef SpGistLeafTupleData *SpGistLeafTuple;
-
-#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
-#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
-#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
- *(Datum *) SGLTDATAPTR(x) : \
- PointerGetDatum(SGLTDATAPTR(x)))
-
-/*
- * SPGiST dead tuple: declaration for examining non-live tuples
- *
- * The tupstate field of this struct must match those of regular inner and
- * leaf tuples, and its size field must match a leaf tuple's.
- * Also, the pointer field must be in the same place as a leaf tuple's heapPtr
- * field, to satisfy some Asserts that we make when replacing a leaf tuple
- * with a dead tuple.
- * We don't use nextOffset, but it's needed to align the pointer field.
- * pointer and xid are only valid when tupstate = REDIRECT.
- */
typedef struct SpGistDeadTupleData
{
unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
@@ -394,7 +434,6 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
* size plus sizeof(ItemIdData) (for the line pointer). This works correctly
* so long as tuple sizes are always maxaligned.
*/
-
/* Page capacity after allowing for fixed header and special space */
#define SPGIST_PAGE_CAPACITY \
MAXALIGN_DOWN(BLCKSZ - \
@@ -456,9 +495,10 @@ extern void SpGistInitPage(Page page, uint16 f);
extern void SpGistInitBuffer(Buffer b, uint16 f);
extern void SpGistInitMetapage(Page page);
extern unsigned int SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum);
+extern unsigned int SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull);
extern SpGistLeafTuple spgFormLeafTuple(SpGistState *state,
ItemPointer heapPtr,
- Datum datum, bool isnull);
+ Datum *datum, bool *isnull);
extern SpGistNodeTuple spgFormNodeTuple(SpGistState *state,
Datum label, bool isnull);
extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
@@ -466,6 +506,8 @@ extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
int nNodes, SpGistNodeTuple *nodes);
extern SpGistDeadTuple spgFormDeadTuple(SpGistState *state, int tupstate,
BlockNumber blkno, OffsetNumber offnum);
+extern void SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state,
+ Datum *datum, bool *isnull, bool key_value_isnull);
extern Datum *spgExtractNodeLabels(SpGistState *state,
SpGistInnerTuple innerTuple);
extern OffsetNumber SpGistPageAddNewItem(SpGistState *state, Page page,
@@ -484,7 +526,7 @@ extern void spgPageIndexMultiDelete(SpGistState *state, Page page,
int firststate, int reststate,
BlockNumber blkno, OffsetNumber offnum);
extern bool spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull);
+ ItemPointer heapPtr, Datum *datum, bool *isnull);
/* spgproc.c */
extern double *spg_key_orderbys_distances(Datum key, bool isLeaf,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index d92a6d12c6..93e6a43b6d 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -169,9 +169,9 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
hash | bogus |
spgist | can_order | f
spgist | can_unique | f
- spgist | can_multi_col | f
+ spgist | can_multi_col | t
spgist | can_exclude | t
- spgist | can_include | f
+ spgist | can_include | t
spgist | bogus |
(36 rows)
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..4fd2b7e878 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -356,7 +356,6 @@ CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "brin" does not support included columns
CREATE INDEX on tbl USING gist(c3) INCLUDE (c1, c4);
CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
-ERROR: access method "spgist" does not support included columns
CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "gin" does not support included columns
CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
diff --git a/src/test/regress/expected/index_including_spgist.out b/src/test/regress/expected/index_including_spgist.out
new file mode 100644
index 0000000000..fa64766fb7
--- /dev/null
+++ b/src/test/regress/expected/index_including_spgist.out
@@ -0,0 +1,139 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+DROP TABLE tbl_spgist;
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+----------
+(0 rows)
+
+DROP TABLE tbl_spgist;
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+ Table "public.tbl_spgist"
+ Column | Type | Collation | Nullable | Default
+--------+---------+-----------+----------+---------
+ c1 | bigint | | |
+ c2 | integer | | |
+ c3 | bigint | | |
+ c4 | box | | |
+Indexes:
+ "tbl_spgist_idx" spgist (c4) INCLUDE (c1, c3)
+
+DROP TABLE tbl_spgist;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..985458a1a8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -50,7 +50,7 @@ test: copy copyselect copydml insert insert_conflict
# ----------
test: create_misc create_operator create_procedure
# These depend on create_misc and create_operator
-test: create_index create_index_spgist create_view index_including index_including_gist
+test: create_index create_index_spgist create_view index_including index_including_gist index_including_spgist
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..f3df961535 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -68,6 +68,7 @@ test: create_index_spgist
test: create_view
test: index_including
test: index_including_gist
+test: index_including_spgist
test: create_aggregate
test: create_function_3
test: create_cast
diff --git a/src/test/regress/sql/index_including_spgist.sql b/src/test/regress/sql/index_including_spgist.sql
new file mode 100644
index 0000000000..a59e73aa22
--- /dev/null
+++ b/src/test/regress/sql/index_including_spgist.sql
@@ -0,0 +1,81 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+DROP TABLE tbl_spgist;
+
--
2.28.0
I added changes in documentation into the patch.
--
Best regards,
Pavel Borisov
Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>
Attachments:
v4-0001-Covering-SP-GiST-index-support-for-INCLUDE-column.patchapplication/octet-stream; name=v4-0001-Covering-SP-GiST-index-support-for-INCLUDE-column.patchDownload
From f7d2a623b2c36ef69eb3c5b74fe2619e5f054e49 Mon Sep 17 00:00:00 2001
From: Pavel Borisov <pashkin.elfe@gmail.com>
Date: Tue, 11 Aug 2020 12:05:51 +0400
Subject: [PATCH v4] Covering SP-GiST index - support for INCLUDE columns
Adding INCLUDE colums for SPGiST index is intended to increase the speed of queries by making scan index only likewise
in btree and GiST index. These included values are added only to leaf tuples and they are not used in index tree search
but they can be fetched during index scan.
The other point of included columns is to overcome SP-GiST limitation of being single-column in principle. I.e. in
certain cases a single covering SP-GiST index can replace several separate ones with less disk space and shared buffers
consumption, faster update etc. Also there can be included any data types without SP-GiST supported opclasses.
Discussion: https://www.postgresql.org/message-id/flat/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com
---
doc/src/sgml/indices.sgml | 4 +-
doc/src/sgml/ref/create_index.sgml | 4 +-
doc/src/sgml/spgist.sgml | 8 +
src/backend/access/spgist/README | 2 +-
src/backend/access/spgist/spgdoinsert.c | 172 +++++---
src/backend/access/spgist/spginsert.c | 5 +-
src/backend/access/spgist/spgscan.c | 87 +++-
src/backend/access/spgist/spgutils.c | 382 ++++++++++++++++--
src/backend/access/spgist/spgvacuum.c | 25 +-
src/backend/access/spgist/spgxlog.c | 6 +-
src/include/access/spgist_private.h | 160 +++++---
src/test/regress/expected/amutils.out | 4 +-
src/test/regress/expected/index_including.out | 1 -
.../expected/index_including_spgist.out | 139 +++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
.../regress/sql/index_including_spgist.sql | 81 ++++
17 files changed, 898 insertions(+), 185 deletions(-)
create mode 100644 src/test/regress/expected/index_including_spgist.out
create mode 100644 src/test/regress/sql/index_including_spgist.sql
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 28adaba72d..c89cc6cb08 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1194,8 +1194,8 @@ CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);
likely to not need to access the heap. If the heap tuple must be visited
anyway, it costs nothing more to get the column's value from there.
Other restrictions are that expressions are not currently supported as
- included columns, and that only B-tree and GiST indexes currently support
- included columns.
+ included columns, and that only B-tree, GiST and SP-GiST indexes currently
+ support included columns.
</para>
<para>
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index ff87b2d28f..3d360bcf47 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -187,8 +187,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
</para>
<para>
- Currently, the B-tree and the GiST index access methods support this
- feature. In B-tree and the GiST indexes, the values of columns listed
+ Currently, the B-tree, GiST and SP-GiST index access methods support
+ this feature. In these indexes, the values of columns listed
in the <literal>INCLUDE</literal> clause are included in leaf tuples
which correspond to heap tuples, but are not included in upper-level
index entries used for tree navigation.
diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index 0e04a08679..868a140a6a 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -240,6 +240,14 @@
inner tuples that are passed through to reach the leaf level.
</para>
+ <para>
+ In case when <acronym>SP-GiST</acronym> index is created with
+ <literal>INCLUDE</literal> clause i.e. covering index, leaf tuples also
+ contain data from included columns. This data is stored uncompressed and can have
+ data types without any SP-GiST operator class.
+
+ </para>
+
<para>
Inner tuples are more complex, since they are branching points in the
search tree. Each inner tuple contains a set of one or more
diff --git a/src/backend/access/spgist/README b/src/backend/access/spgist/README
index b55b073832..87e08431fa 100644
--- a/src/backend/access/spgist/README
+++ b/src/backend/access/spgist/README
@@ -73,8 +73,8 @@ Leaf tuple consists of:
Example:
radix tree - the rest of string (postfix)
quad and k-d tree - the point itself
-
ItemPointer to the heap
+ optional included colums values
NULLS HANDLING
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 934d65b89f..4c133b7106 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -22,7 +22,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
-
+#include "access/htup_details.h"
/*
* SPPageDesc tracks all info about a page we are inserting into. In some
@@ -220,7 +220,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
SpGistBlockIsRoot(current->blkno))
{
/* Tuple is not part of a chain */
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
current->offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -253,7 +253,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
PageGetItemId(current->page, current->offnum));
if (head->tupstate == SPGIST_LIVE)
{
- leafTuple->nextOffset = head->nextOffset;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, SGLT_GET_OFFSET(head->nextOffset));
offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -264,14 +264,14 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
*/
head = (SpGistLeafTuple) PageGetItem(current->page,
PageGetItemId(current->page, current->offnum));
- head->nextOffset = offnum;
+ SGLT_SET_OFFSET(head->nextOffset, offnum);
xlrec.offnumLeaf = offnum;
xlrec.offnumHeadLeaf = current->offnum;
}
else if (head->tupstate == SPGIST_DEAD)
{
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
PageIndexTupleDelete(current->page, current->offnum);
if (PageAddItem(current->page,
(Item) leafTuple, leafTuple->size,
@@ -362,13 +362,13 @@ checkSplitConditions(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* Don't count it in result, because it won't go to other page */
}
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
*nToSplit = n;
@@ -437,7 +437,7 @@ moveLeafs(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* We don't want to move it, so don't count it in size */
toDelete[nDelete] = i;
nDelete++;
@@ -446,7 +446,7 @@ moveLeafs(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
/* Find a leaf page that will hold them */
@@ -475,7 +475,7 @@ moveLeafs(Relation index, SpGistState *state,
* don't care). We're modifying the tuple on the source page
* here, but it's okay since we're about to delete it.
*/
- it->nextOffset = r;
+ SGLT_SET_OFFSET(it->nextOffset, r);
r = SpGistPageAddNewItem(state, npage, (Item) it, it->size,
&startOffset, false);
@@ -490,7 +490,7 @@ moveLeafs(Relation index, SpGistState *state,
}
/* add the new tuple as well */
- newLeafTuple->nextOffset = r;
+ SGLT_SET_OFFSET(newLeafTuple->nextOffset, r);
r = SpGistPageAddNewItem(state, npage,
(Item) newLeafTuple, newLeafTuple->size,
&startOffset, false);
@@ -709,6 +709,9 @@ doPickSplit(Relation index, SpGistState *state,
int nToDelete,
nToInsert,
maxToInclude;
+ Datum *leafChainDatums;
+ bool *leafChainIsnulls;
+ const int natts = IndexRelationGetNumberOfAttributes(index);
in.level = level;
@@ -723,14 +726,16 @@ doPickSplit(Relation index, SpGistState *state,
toInsert = (OffsetNumber *) palloc(sizeof(OffsetNumber) * n);
newLeafs = (SpGistLeafTuple *) palloc(sizeof(SpGistLeafTuple) * n);
leafPageSelect = (uint8 *) palloc(sizeof(uint8) * n);
-
STORE_STATE(state, xlrec.stateSrc);
+ leafChainDatums = (Datum *) palloc(n * natts * sizeof(Datum));
+ leafChainIsnulls = (bool *) palloc(n * natts * sizeof(bool));
+
/*
- * Form list of leaf tuples which will be distributed as split result;
- * also, count up the amount of space that will be freed from current.
- * (Note that in the non-root case, we won't actually delete the old
- * tuples, only replace them with redirects or placeholders.)
+ * Collect leaf tuples which will be distributed as split result; also,
+ * count up the amount of space that will be freed from current. (Note
+ * that in the non-root case, we won't actually delete the old tuples,
+ * only replace them with redirects or placeholders.)
*
* Note: the SGLTDATUM calls here are safe even when dealing with a nulls
* page. For a pass-by-value data type we will fetch a word that must
@@ -738,7 +743,15 @@ doPickSplit(Relation index, SpGistState *state,
* tuples must have size at least SGDTSIZE). For a pass-by-reference type
* we are just computing a pointer that isn't going to get dereferenced.
* So it's not worth guarding the calls with isNulls checks.
+ *
+ * Datums and isnulls of all leaf tuple attributes in a chain are
+ * collected into 2-d arrays: (number of tuples in chain) x (number of
+ * attributes) First attribute is key, the other - included attributes (if
+ * any). After picksplit we need to form new leaf tuples as key attribute
+ * length can change which can affect alignment of every include
+ * attribute.
*/
+
nToInsert = 0;
nToDelete = 0;
spaceToDelete = 0;
@@ -759,6 +772,8 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+ SpGistDeformLeafTuple(it, state, leafChainDatums + nToInsert * natts,
+ leafChainIsnulls + nToInsert * natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -784,6 +799,9 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+
+ SpGistDeformLeafTuple(it, state, leafChainDatums + nToInsert * natts,
+ leafChainIsnulls + nToInsert * natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -795,7 +813,7 @@ doPickSplit(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
toDelete[nToDelete] = i;
nToDelete++;
/* replacing it with redirect will save no space */
@@ -803,7 +821,7 @@ doPickSplit(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
}
in.nTuples = nToInsert;
@@ -816,10 +834,17 @@ doPickSplit(Relation index, SpGistState *state,
*/
in.datums[in.nTuples] = SGLTDATUM(newLeafTuple, state);
heapPtrs[in.nTuples] = newLeafTuple->heapPtr;
+
+ SpGistDeformLeafTuple(newLeafTuple, state, leafChainDatums + (in.nTuples) * natts,
+ leafChainIsnulls + (in.nTuples) * natts, isNulls);
in.nTuples++;
memset(&out, 0, sizeof(out));
+ /*
+ * Process collected key values of tuples from the chain. Included values
+ * are used to build fresh leaf tuples unchanged.
+ */
if (!isNulls)
{
/*
@@ -837,9 +862,11 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- out.leafTupleDatums[i],
- false);
+ *(leafChainDatums + i * natts) = (Datum) out.leafTupleDatums[i];
+ *(leafChainIsnulls + i * natts) = false;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums + i * natts,
+ leafChainIsnulls + i * natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -860,9 +887,14 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- (Datum) 0,
- true);
+ /*
+ * Nulls tree can contain only null key values.
+ */
+ *(leafChainDatums + i * natts) = (Datum) 0;
+ *(leafChainIsnulls + i * natts) = true;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums + i * natts,
+ leafChainIsnulls + i * natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -1196,10 +1228,10 @@ doPickSplit(Relation index, SpGistState *state,
if (ItemPointerIsValid(&nodes[n]->t_tid))
{
Assert(ItemPointerGetBlockNumber(&nodes[n]->t_tid) == leafBlock);
- it->nextOffset = ItemPointerGetOffsetNumber(&nodes[n]->t_tid);
+ SGLT_SET_OFFSET(it->nextOffset, ItemPointerGetOffsetNumber(&nodes[n]->t_tid));
}
else
- it->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(it->nextOffset, InvalidOffsetNumber);
/* Insert it on page */
newoffset = SpGistPageAddNewItem(state, BufferGetPage(leafBuffer),
@@ -1889,67 +1921,83 @@ spgSplitNodeAction(Relation index, SpGistState *state,
*/
bool
spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull)
+ ItemPointer heapPtr, Datum *datum, bool *isnull)
{
int level = 0;
- Datum leafDatum;
+ Datum *leafDatum;
int leafSize;
SPPageDesc current,
parent;
FmgrInfo *procinfo = NULL;
+ int i;
/*
* Look up FmgrInfo of the user-defined choose function once, to save
* cycles in the loop below.
*/
- if (!isnull)
+ if (!isnull[0])
procinfo = index_getprocinfo(index, 1, SPGIST_CHOOSE_PROC);
/*
* Prepare the leaf datum to insert.
- *
+ */
+
+ leafDatum = (Datum *) palloc0(sizeof(Datum) * (IndexRelationGetNumberOfAttributes(index)));
+
+ /*
* If an optional "compress" method is provided, then call it to form the
- * leaf datum from the input datum. Otherwise store the input datum as
- * is. Since we don't use index_form_tuple in this AM, we have to make
- * sure value to be inserted is not toasted; FormIndexDatum doesn't
- * guarantee that. But we assume the "compress" method to return an
- * untoasted value.
+ * key datum from the input datum. Otherwise store the input datum as is.
+ * Since we don't use index_form_tuple in this AM, we have to make sure
+ * value to be inserted is not toasted; FormIndexDatum doesn't guarantee
+ * that. But we assume the "compress" method to return an untoasted
+ * value.
*/
- if (!isnull)
+ if (!isnull[0])
{
if (OidIsValid(index_getprocid(index, 1, SPGIST_COMPRESS_PROC)))
{
FmgrInfo *compressProcinfo = NULL;
compressProcinfo = index_getprocinfo(index, 1, SPGIST_COMPRESS_PROC);
- leafDatum = FunctionCall1Coll(compressProcinfo,
- index->rd_indcollation[0],
- datum);
+ leafDatum[0] = FunctionCall1Coll(compressProcinfo,
+ index->rd_indcollation[0],
+ datum[0]);
}
else
{
Assert(state->attLeafType.type == state->attType.type);
if (state->attType.attlen == -1)
- leafDatum = PointerGetDatum(PG_DETOAST_DATUM(datum));
+ leafDatum[0] = PointerGetDatum(PG_DETOAST_DATUM(datum[0]));
else
- leafDatum = datum;
+ leafDatum[0] = datum[0];
}
}
else
- leafDatum = (Datum) 0;
+ leafDatum[0] = (Datum) 0;
+
+ for (i = 1; i < IndexRelationGetNumberOfAttributes(index); i++)
+ {
+ if (!isnull[i])
+ {
+ if (TupleDescAttr(state->includeTupdesc, i - 1)->attlen == -1)
+ leafDatum[i] = PointerGetDatum(PG_DETOAST_DATUM(datum[i]));
+ else
+ leafDatum[i] = datum[i];
+ }
+ else
+ leafDatum[i] = (Datum) 0;
+ }
+
/*
- * Compute space needed for a leaf tuple containing the given datum.
+ * Compute space needed on a page for a leaf tuple containing the given
+ * datum.
*
* If it isn't gonna fit, and the opclass can't reduce the datum size by
* suffixing, bail out now rather than getting into an endless loop.
*/
- if (!isnull)
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
- else
- leafSize = SGDTSIZE + sizeof(ItemIdData);
+ leafSize = SpgLeafSize(state, leafDatum, isnull) + sizeof(ItemIdData);
if (leafSize > SPGIST_PAGE_CAPACITY && !state->config.longValuesOK)
ereport(ERROR,
@@ -1961,7 +2009,7 @@ spgdoinsert(Relation index, SpGistState *state,
errhint("Values larger than a buffer page cannot be indexed.")));
/* Initialize "current" to the appropriate root page */
- current.blkno = isnull ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
+ current.blkno = isnull[0] ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
current.buffer = InvalidBuffer;
current.page = NULL;
current.offnum = FirstOffsetNumber;
@@ -1995,7 +2043,7 @@ spgdoinsert(Relation index, SpGistState *state,
*/
current.buffer =
SpGistGetBuffer(index,
- GBUF_LEAF | (isnull ? GBUF_NULLS : 0),
+ GBUF_LEAF | (isnull[0] ? GBUF_NULLS : 0),
Min(leafSize, SPGIST_PAGE_CAPACITY),
&isNew);
current.blkno = BufferGetBlockNumber(current.buffer);
@@ -2037,7 +2085,7 @@ spgdoinsert(Relation index, SpGistState *state,
current.page = BufferGetPage(current.buffer);
/* should not arrive at a page of the wrong type */
- if (isnull ? !SpGistPageStoresNulls(current.page) :
+ if (isnull[0] ? !SpGistPageStoresNulls(current.page) :
SpGistPageStoresNulls(current.page))
elog(ERROR, "SPGiST index page %u has wrong nulls flag",
current.blkno);
@@ -2054,7 +2102,7 @@ spgdoinsert(Relation index, SpGistState *state,
{
/* it fits on page, so insert it and we're done */
addLeafTuple(index, state, leafTuple,
- ¤t, &parent, isnull, isNew);
+ ¤t, &parent, isnull[0], isNew);
break;
}
else if ((sizeToSplit =
@@ -2068,14 +2116,14 @@ spgdoinsert(Relation index, SpGistState *state,
* chain to another leaf page rather than splitting it.
*/
Assert(!isNew);
- moveLeafs(index, state, ¤t, &parent, leafTuple, isnull);
+ moveLeafs(index, state, ¤t, &parent, leafTuple, isnull[0]);
break; /* we're done */
}
else
{
/* picksplit */
if (doPickSplit(index, state, ¤t, &parent,
- leafTuple, level, isnull, isNew))
+ leafTuple, level, isnull[0], isNew))
break; /* doPickSplit installed new tuples */
/* leaf tuple will not be inserted yet */
@@ -2110,8 +2158,8 @@ spgdoinsert(Relation index, SpGistState *state,
innerTuple = (SpGistInnerTuple) PageGetItem(current.page,
PageGetItemId(current.page, current.offnum));
- in.datum = datum;
- in.leafDatum = leafDatum;
+ in.datum = datum[0];
+ in.leafDatum = leafDatum[0];
in.level = level;
in.allTheSame = innerTuple->allTheSame;
in.hasPrefix = (innerTuple->prefixSize > 0);
@@ -2121,7 +2169,7 @@ spgdoinsert(Relation index, SpGistState *state,
memset(&out, 0, sizeof(out));
- if (!isnull)
+ if (!isnull[0])
{
/* use user-defined choose method */
FunctionCall2Coll(procinfo,
@@ -2158,11 +2206,11 @@ spgdoinsert(Relation index, SpGistState *state,
/* Adjust level as per opclass request */
level += out.result.matchNode.levelAdd;
/* Replace leafDatum and recompute leafSize */
- if (!isnull)
+ if (!isnull[0])
{
- leafDatum = out.result.matchNode.restDatum;
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
+ leafDatum[0] = out.result.matchNode.restDatum;
+ leafSize = SpgLeafSize(state, leafDatum, isnull) +
+ sizeof(ItemIdData);
}
/*
@@ -2227,6 +2275,6 @@ spgdoinsert(Relation index, SpGistState *state,
SpGistSetLastUsedPage(index, parent.buffer);
UnlockReleaseBuffer(parent.buffer);
}
-
+ pfree(leafDatum);
return true;
}
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4508a2b92..b54ae85f6e 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -55,8 +55,7 @@ spgistBuildCallback(Relation index, ItemPointer tid, Datum *values,
* lock on some buffer. So we need to be willing to retry. We can flush
* any temp data when retrying.
*/
- while (!spgdoinsert(index, &buildstate->spgstate, tid,
- *values, *isnull))
+ while (!spgdoinsert(index, &buildstate->spgstate, tid, values, isnull))
{
MemoryContextReset(buildstate->tmpCtx);
}
@@ -226,7 +225,7 @@ spginsert(Relation index, Datum *values, bool *isnull,
* to avoid cumulative memory consumption. That means we also have to
* redo initSpGistState(), but it's cheap enough not to matter.
*/
- while (!spgdoinsert(index, &spgstate, ht_ctid, *values, *isnull))
+ while (!spgdoinsert(index, &spgstate, ht_ctid, values, isnull))
{
MemoryContextReset(insertCtx);
initSpGistState(&spgstate, index);
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 4d506bfb9a..5a3c7c50cf 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -28,7 +28,8 @@
typedef void (*storeRes_func) (SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isNull, bool recheck,
- bool recheckDistances, double *distances);
+ bool recheckDistances, double *distances,
+ SpGistLeafTuple leafTuple);
/*
* Pairing heap comparison function for the SpGistSearchItem queue.
@@ -88,6 +89,9 @@ spgFreeSearchItem(SpGistScanOpaque so, SpGistSearchItem *item)
if (item->traversalValue)
pfree(item->traversalValue);
+ if (item->isLeaf && item->leafTuple)
+ pfree(item->leafTuple);
+
pfree(item);
}
@@ -134,6 +138,8 @@ spgAddStartItem(SpGistScanOpaque so, bool isnull)
startEntry->recheck = false;
startEntry->recheckDistances = false;
+ startEntry->leafTuple = NULL;
+
spgAddSearchItemToQueue(so, startEntry);
}
@@ -438,14 +444,30 @@ spgendscan(IndexScanDesc scan)
* Leaf SpGistSearchItem constructor, called in queue context
*/
static SpGistSearchItem *
-spgNewHeapItem(SpGistScanOpaque so, int level, ItemPointer heapPtr,
+spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
Datum leafValue, bool recheck, bool recheckDistances,
bool isnull, double *distances)
{
SpGistSearchItem *item = spgAllocSearchItem(so, isnull, distances);
+ /*
+ * If there are include attributes search item in the queue should contain
+ * them.
+ */
+ if (so->state.includeTupdesc)
+ {
+ Assert(so->state.includeTupdesc->natts);
+
+ item->leafTuple = palloc(leafTuple->size);
+ memcpy(item->leafTuple, leafTuple, leafTuple->size);
+ }
+ else
+ {
+ item->leafTuple = NULL;
+ }
+
item->level = level;
- item->heapPtr = *heapPtr;
+ item->heapPtr = leafTuple->heapPtr;
/* copy value to queue cxt out of tmp cxt */
item->value = isnull ? (Datum) 0 :
datumCopy(leafValue, so->state.attLeafType.attbyval,
@@ -503,6 +525,8 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
in.returnData = so->want_itup;
in.leafDatum = SGLTDATUM(leafTuple, &so->state);
+
+
out.leafValue = (Datum) 0;
out.recheck = false;
out.distances = NULL;
@@ -528,7 +552,7 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
/* the scan is ordered -> add the item to the queue */
MemoryContext oldCxt = MemoryContextSwitchTo(so->traversalCxt);
SpGistSearchItem *heapItem = spgNewHeapItem(so, item->level,
- &leafTuple->heapPtr,
+ leafTuple,
leafValue,
recheck,
recheckDistances,
@@ -543,8 +567,10 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
{
/* non-ordered scan, so report the item right away */
Assert(!recheckDistances);
+
storeRes(so, &leafTuple->heapPtr, leafValue, isnull,
- recheck, false, NULL);
+ recheck, false, NULL, leafTuple);
+
*reportedSome = true;
}
}
@@ -736,7 +762,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
/* dead tuple should be first in chain */
Assert(offset == ItemPointerGetOffsetNumber(&item->heapPtr));
/* No live entries on this page */
- Assert(leafTuple->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(leafTuple->nextOffset) == InvalidOffsetNumber);
return SpGistBreakOffsetNumber;
}
}
@@ -750,7 +776,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes);
- return leafTuple->nextOffset;
+ return SGLT_GET_OFFSET(leafTuple->nextOffset);
}
/*
@@ -782,8 +808,8 @@ redirect:
{
/* We store heap items in the queue only in case of ordered search */
Assert(so->numberOfNonNullOrderBys > 0);
- storeRes(so, &item->heapPtr, item->value, item->isNull,
- item->recheck, item->recheckDistances, item->distances);
+ storeRes(so, &item->heapPtr, item->value, item->isNull, item->recheck,
+ item->recheckDistances, item->distances, item->leafTuple);
reportedSome = true;
}
else
@@ -877,7 +903,7 @@ redirect:
static void
storeBitmap(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *distances)
+ double *distances, SpGistLeafTuple leafTuple)
{
Assert(!recheckDistances && !distances);
tbm_add_tuples(so->tbm, heapPtr, 1, recheck);
@@ -904,7 +930,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
static void
storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *nonNullDistances)
+ double *nonNullDistances, SpGistLeafTuple leafTuple)
{
Assert(so->nPtrs < MaxIndexTuplesPerPage);
so->heapPtrs[so->nPtrs] = *heapPtr;
@@ -949,9 +975,38 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
* Reconstruct index data. We have to copy the datum out of the temp
* context anyway, so we may as well create the tuple here.
*/
- so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
- &leafValue,
- &isnull);
+ if (so->state.includeTupdesc)
+ {
+ /* Add included attributes */
+ Datum *leafDatums;
+ bool *leafIsnulls;
+
+ Assert(so->state.includeTupdesc->natts);
+
+ leafDatums = (Datum *) palloc(sizeof(Datum) * (so->state.includeTupdesc->natts + 1));
+ leafIsnulls = (bool *) palloc(sizeof(bool) * (so->state.includeTupdesc->natts + 1));
+
+ SpGistDeformLeafTuple(leafTuple, &so->state, leafDatums, leafIsnulls, isnull);
+
+ /*
+ * override key value extracted from LeafTuple in case we've
+ * reconstructed it already
+ */
+ leafDatums[0] = leafValue;
+ leafIsnulls[0] = isnull;
+
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ leafDatums,
+ leafIsnulls);
+ pfree(leafDatums);
+ pfree(leafIsnulls);
+ }
+ else
+ {
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ &leafValue,
+ &isnull);
+ }
}
so->nPtrs++;
}
@@ -1019,6 +1074,10 @@ spgcanreturn(Relation index, int attno)
{
SpGistCache *cache;
+ /* Included attributes always can be fetched for index-only scans */
+ if (attno > 1)
+ return true;
+
/* We can do it if the opclass config function says so */
cache = spgGetCache(index);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 0efe05e552..3ca47ff53d 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -31,7 +31,18 @@
#include "utils/index_selfuncs.h"
#include "utils/lsyscache.h"
#include "utils/syscache.h"
+#include "access/itup.h"
+#include "access/detoast.h"
+#include "access/toast_internals.h"
+#include "access/heaptoast.h"
+#include "utils/expandeddatum.h"
+/* Does att's datatype allow packing into the 1-byte-header varlena format? */
+#define ATT_IS_PACKABLE(att) \
+ ((att)->attlen == -1 && (att)->attstorage != TYPSTORAGE_PLAIN)
+
+Size spgIncludedDataSize(TupleDesc tupleDesc, Datum *values,
+ bool *isnull, Size start);
/*
* SP-GiST handler function: return IndexAmRoutine with access method parameters
@@ -49,7 +60,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amcanorderbyop = true;
amroutine->amcanbackward = false;
amroutine->amcanunique = false;
- amroutine->amcanmulticol = false;
+ amroutine->amcanmulticol = true;
amroutine->amoptionalkey = true;
amroutine->amsearcharray = false;
amroutine->amsearchnulls = true;
@@ -57,7 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amclusterable = false;
amroutine->ampredlocks = false;
amroutine->amcanparallel = false;
- amroutine->amcaninclude = false;
+ amroutine->amcaninclude = true;
amroutine->amusemaintenanceworkmem = false;
amroutine->amparallelvacuumoptions =
VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
@@ -116,14 +127,21 @@ spgGetCache(Relation index)
cache = MemoryContextAllocZero(index->rd_indexcxt,
sizeof(SpGistCache));
- /* SPGiST doesn't support multi-column indexes */
- Assert(index->rd_att->natts == 1);
+ /*
+ * SPGiST should have one key column and can also have included
+ * columns
+ */
+ if (IndexRelationGetNumberOfKeyAttributes(index) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("SPGiST index can have only one key column")));
/*
- * Get the actual data type of the indexed column from the index
- * tupdesc. We pass this to the opclass config function so that
- * polymorphic opclasses are possible.
+ * Get the actual data type of the key column from the index tupdesc.
+ * We pass this to the opclass config function so that polymorphic
+ * opclasses are possible.
*/
+
atttype = TupleDescAttr(index->rd_att, 0)->atttypid;
/* Call the config function to get config info for the opclass */
@@ -156,6 +174,7 @@ spgGetCache(Relation index)
fillTypeDesc(&cache->attPrefixType, cache->config.prefixType);
fillTypeDesc(&cache->attLabelType, cache->config.labelType);
+
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
@@ -177,7 +196,23 @@ spgGetCache(Relation index)
/* assume it's up to date */
cache = (SpGistCache *) index->rd_amcache;
}
+ /* Form descriptor for included columns if any */
+ if (IndexRelationGetNumberOfAttributes(index) > 1)
+ {
+ int i;
+
+ cache->includeTupdesc = CreateTemplateTupleDesc(
+ IndexRelationGetNumberOfAttributes(index) - 1);
+ for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; i++)
+ {
+ TupleDescInitEntry(cache->includeTupdesc, i + 1, NULL,
+ TupleDescAttr(index->rd_att, i + 1)->atttypid,
+ -1, 0);
+ }
+ }
+ else
+ cache->includeTupdesc = NULL;
return cache;
}
@@ -190,6 +225,7 @@ initSpGistState(SpGistState *state, Relation index)
/* Get cached static information about index */
cache = spgGetCache(index);
+ state->includeTupdesc = cache->includeTupdesc;
state->config = cache->config;
state->attType = cache->attType;
state->attLeafType = cache->attLeafType;
@@ -603,7 +639,7 @@ spgoptions(Datum reloptions, bool validate)
/*
* Get the space needed to store a non-null datum of the indicated type.
- * Note the result is already rounded up to a MAXALIGN boundary.
+ * Note the result is not maxaligned and this should be done by caller if needed.
* Also, we follow the SPGiST convention that pass-by-val types are
* just stored in their Datum representation (compare memcpyDatum).
*/
@@ -619,7 +655,7 @@ SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum)
else
size = VARSIZE_ANY(datum);
- return MAXALIGN(size);
+ return size;
}
/*
@@ -642,36 +678,202 @@ memcpyDatum(void *target, SpGistTypeDesc *att, Datum datum)
}
/*
- * Construct a leaf tuple containing the given heap TID and datum value
+ * Private version of heap_compute_data_size with start address not
+ * necessarily MAXALIGNed. The reason is that start address (and alignment)
+ * influence alignment of each of next values and overall size of included
+ * data area in SpGiST leaf tuple.
+ */
+Size
+spgIncludedDataSize(TupleDesc tupleDesc,
+ Datum *values,
+ bool *isnull, Size start)
+{
+ Size data_length = 0;
+ int i;
+ int numberOfAttributes = tupleDesc->natts;
+
+ data_length = start;
+ for (i = 0; i < numberOfAttributes; i++)
+ {
+ Datum val;
+ Form_pg_attribute atti;
+
+ if (isnull[i])
+ continue;
+
+ val = values[i];
+ atti = TupleDescAttr(tupleDesc, i);
+
+ if (ATT_IS_PACKABLE(atti) &&
+ VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
+ {
+ /*
+ * we're anticipating converting to a short varlena header, so
+ * adjust length and don't count any alignment
+ */
+ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
+ }
+ else if (atti->attlen == -1 &&
+ VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+ {
+ /*
+ * we want to flatten the expanded value so that the constructed
+ * tuple doesn't depend on it
+ */
+ data_length = att_align_nominal(data_length, atti->attalign);
+ data_length += EOH_get_flat_size(DatumGetEOHP(val));
+ }
+ else
+ {
+ data_length = att_align_datum(data_length, atti->attalign,
+ atti->attlen, val);
+ data_length = att_addlength_datum(data_length, atti->attlen,
+ val);
+ }
+ }
+ return data_length - start;
+}
+
+/* Calculate overall leaf tuple size. SGLTHDRSZ is MAXALIGNed only for backward
+ * compatibility and there might be gap between header and key data. After key
+ * data there are no such gaps more than is is necessary for each value
+ * alignment. Overall result is MAXALIGNed.*/
+unsigned int
+SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull)
+{
+ /* compute space needed, nullmask size and offset for include attributes */
+ unsigned int size = SGLTHDRSZ;
+ unsigned int i;
+
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+ /* nullmask size */
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ size += (state->includeTupdesc->natts / 8) + 1;
+ break;
+ }
+ }
+ /* overall included attributes size each with added proper alignment. */
+ size += spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ }
+ return MAXALIGN(size);
+}
+
+/*
+ * Construct a leaf tuple containing the given heap TID, key data and included
+ * columns data. Key data starts from MAXALIGN boundary for backward compatibility.
+ * Nullmask apply only to included attributes and is placed just after key data if
+ * there is at least one NULL among included attributes. It doesn't need alignment.
+ * Then all included columns data follow aligned by their typealign's.
*/
SpGistLeafTuple
spgFormLeafTuple(SpGistState *state, ItemPointer heapPtr,
- Datum datum, bool isnull)
+ Datum *datum, bool *isnull)
{
SpGistLeafTuple tup;
- unsigned int size;
+ unsigned int size = SGLTHDRSZ;
+ unsigned int include_offset = 0;
+ unsigned int nullmask_size = 0;
+ unsigned int data_offset = 0;
+ unsigned int data_size = 0;
+ uint16 tupmask = 0;
+ int i;
- /* compute space needed (note result is already maxaligned) */
- size = SGLTHDRSZ;
- if (!isnull)
- size += SpGistGetTypeSize(&state->attLeafType, datum);
+ /*
+ * Calculate space needed. If there are include attributes also calculate
+ * sizes and offsets needed for heap_fill_tuple
+ */
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = size;
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ nullmask_size = (state->includeTupdesc->natts / 8) + 1;
+ size += nullmask_size;
+ break;
+ }
+ }
+
+ /*
+ * Alignment of all included attributes is counted inside data_size.
+ * data_offset itself is not aligned.
+ */
+ data_size = spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ data_offset = size;
+
+ size += data_size;
+ }
/*
* Ensure that we can replace the tuple with a dead tuple later. This
- * test is unnecessary when !isnull, but let's be safe.
+ * test is unnecessary when !isnull[0], but let's be safe.
*/
if (size < SGDTSIZE)
size = SGDTSIZE;
/* OK, form the tuple */
- tup = (SpGistLeafTuple) palloc0(size);
+ tup = (SpGistLeafTuple) palloc0(MAXALIGN(size));
- tup->size = size;
- tup->nextOffset = InvalidOffsetNumber;
+ tup->size = MAXALIGN(size);
+ SGLT_SET_OFFSET(tup->nextOffset, InvalidOffsetNumber);
tup->heapPtr = *heapPtr;
- if (!isnull)
- memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum);
+ if (!isnull[0])
+ memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum[0]);
+
+ /* Add included columns data to leaf tuple if any. */
+ if (state->includeTupdesc)
+ {
+ /*
+ * The start of include attributes tuple is not aligned by default.
+ * All values alignment should be done by heap_fill_tuple
+ * automaticaly. If there is a nulls mask it is included just after
+ * key attribute data and it should not be aligned.
+ */
+ heap_fill_tuple(state->includeTupdesc, datum + 1, isnull + 1,
+ (char *) tup + data_offset,
+ data_size, &tupmask,
+ (nullmask_size ? (bits8 *) tup + include_offset : NULL));
+
+ if (nullmask_size)
+ SGLT_SET_CONTAINSNULLMASK(tup->nextOffset, 1);
+
+ /*
+ * We do this because heap_fill_tuple wants to initialize a "tupmask"
+ * which is used for HeapTuples, but the only relevant info is the
+ * "has variable attributes" field. We have already set the hasnull
+ * bit above.
+ */
+ if (tupmask & HEAP_HASVARWIDTH)
+ SGLT_SET_CONTAINSVARATT(tup->nextOffset, 1);
+ }
return tup;
}
@@ -688,10 +890,10 @@ spgFormNodeTuple(SpGistState *state, Datum label, bool isnull)
unsigned int size;
unsigned short infomask = 0;
- /* compute space needed (note result is already maxaligned) */
+ /* compute space needed */
size = SGNTHDRSZ;
if (!isnull)
- size += SpGistGetTypeSize(&state->attLabelType, label);
+ size += MAXALIGN(SpGistGetTypeSize(&state->attLabelType, label));
/*
* Here we make sure that the size will fit in the field reserved for it
@@ -735,7 +937,7 @@ spgFormInnerTuple(SpGistState *state, bool hasPrefix, Datum prefix,
/* Compute size needed */
if (hasPrefix)
- prefixSize = SpGistGetTypeSize(&state->attPrefixType, prefix);
+ prefixSize = MAXALIGN(SpGistGetTypeSize(&state->attPrefixType, prefix));
else
prefixSize = 0;
@@ -1046,3 +1248,133 @@ spgproperty(Oid index_oid, int attno,
return true;
}
+
+/*
+ * Convert an SpGist tuple into palloc'd Datum/isnull arrays.
+ *
+ */
+void
+SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state, Datum *datum, bool *isnull,
+ bool key_isnull)
+{
+ unsigned int include_offset; /* offset of include data */
+ int off;
+ bits8 *nullmask_ptr = NULL; /* ptr to null bitmap in tuple */
+ char *tp;
+ bool slow = false; /* can we use/set attcacheoff? */
+ int i;
+
+ if (key_isnull)
+ {
+ datum[0] = (Datum) 0;
+ isnull[0] = true;
+ }
+ else
+ {
+ datum[0] = SGLTDATUM(tup, state);
+ isnull[0] = false;
+ }
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = key_isnull ? SGLTHDRSZ : SGLTHDRSZ + SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ tp = (char *) tup;
+ off = include_offset;
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ nullmask_ptr = (bits8 *) tp + include_offset;
+ off += (state->includeTupdesc->natts) / 8 + 1;
+ }
+
+ if (state->attLeafType.attlen > 0 && !SGLT_GET_CONTAINSVARATT(tup->nextOffset) &&
+ !SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ /* can use attcacheoff for all attributes */
+ {
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ isnull[i] = false;
+ if (thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else
+ {
+ off = att_align_nominal(off, thisatt->attalign);
+ thisatt->attcacheoff = off;
+ }
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+ }
+ }
+ else
+
+ /*
+ * general case: can use cache until first null or varlen
+ * attribute
+ */
+ {
+ if (state->attLeafType.attlen <= 0)
+ slow = true; /* can't use attcacheoff at all */
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ if (att_isnull(i - 1, nullmask_ptr))
+ {
+ datum[i] = (Datum) 0;
+ isnull[i] = true;
+ slow = true; /* can't use attcacheoff anymore */
+ continue;
+ }
+ }
+
+ isnull[i] = false;
+
+ if (!slow && thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else if (thisatt->attlen == -1)
+ {
+ /*
+ * We can only cache the offset for a varlena attribute if
+ * the offset is already suitably aligned, so that there
+ * would be no pad bytes in any case: then the offset will
+ * be valid for either an aligned or unaligned value.
+ */
+ if (!slow && off == att_align_nominal(off, thisatt->attalign))
+ thisatt->attcacheoff = off;
+ else
+ {
+ off = att_align_pointer(off, thisatt->attalign, -1, tp + off);
+ slow = true;
+ }
+ }
+ else
+ {
+ /* not varlena, so safe to use att_align_nominal */
+ off = att_align_nominal(off, thisatt->attalign);
+
+ if (!slow)
+ thisatt->attcacheoff = off;
+ }
+
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+
+ if (thisatt->attlen <= 0)
+ slow = true; /* can't use attcacheoff anymore */
+ }
+ }
+ }
+}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index bd98707f3c..a0d76901fc 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -168,23 +168,28 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
/* Form predecessor map, too */
- if (lt->nextOffset != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) != InvalidOffsetNumber)
{
/* paranoia about corrupted chain links */
- if (lt->nextOffset < FirstOffsetNumber ||
- lt->nextOffset > max ||
- predecessor[lt->nextOffset] != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) < FirstOffsetNumber ||
+ SGLT_GET_OFFSET(lt->nextOffset) > max ||
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] != InvalidOffsetNumber)
elog(ERROR, "inconsistent tuple chain links in page %u of index \"%s\"",
BufferGetBlockNumber(buffer),
RelationGetRelationName(index));
- predecessor[lt->nextOffset] = i;
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] = i;
}
}
else if (lt->tupstate == SPGIST_REDIRECT)
{
SpGistDeadTuple dt = (SpGistDeadTuple) lt;
- Assert(dt->nextOffset == InvalidOffsetNumber);
+ /*
+ * Dead tuple nextOffset is allowed to have any values of two
+ * highest bits in case it is inherited from SpGistLeafTuple where
+ * these bits has their own meaning.
+ */
+ Assert(SGLT_GET_OFFSET(dt->nextOffset) == InvalidOffsetNumber);
Assert(ItemPointerIsValid(&dt->pointer));
/*
@@ -201,7 +206,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
else
{
- Assert(lt->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(lt->nextOffset) == InvalidOffsetNumber);
}
}
@@ -250,7 +255,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
prevLive = deletable[i] ? InvalidOffsetNumber : i;
/* scan down the chain ... */
- j = head->nextOffset;
+ j = SGLT_GET_OFFSET(head->nextOffset);
while (j != InvalidOffsetNumber)
{
SpGistLeafTuple lt;
@@ -301,7 +306,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
interveningDeletable = false;
}
- j = lt->nextOffset;
+ j = SGLT_GET_OFFSET(lt->nextOffset);
}
if (prevLive == InvalidOffsetNumber)
@@ -366,7 +371,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 7be2291d07..4022e3af07 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -122,8 +122,8 @@ spgRedoAddLeaf(XLogReaderState *record)
head = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, xldata->offnumHeadLeaf));
- Assert(head->nextOffset == leafTupleHdr.nextOffset);
- head->nextOffset = xldata->offnumLeaf;
+ Assert(SGLT_GET_OFFSET(head->nextOffset) == SGLT_GET_OFFSET(leafTupleHdr.nextOffset));
+ SGLT_SET_OFFSET(head->nextOffset, xldata->offnumLeaf);
}
}
else
@@ -822,7 +822,7 @@ spgRedoVacuumLeaf(XLogReaderState *record)
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
PageSetLSN(page, lsn);
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index 00b98ec6a0..8d03adb8f5 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -141,6 +141,7 @@ typedef struct SpGistState
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc; /* tuple descriptor of included columns */
char *deadTupleStorage; /* workspace for spgFormDeadTuple */
@@ -148,6 +149,98 @@ typedef struct SpGistState
bool isBuild; /* true if doing index build */
} SpGistState;
+/*
+ * SPGiST leaf tuple: carries a datum and a heap tuple TID
+ *
+ * In the simplest case, the datum is the same as the indexed value; but
+ * it could also be a suffix or some other sort of delta that permits
+ * reconstruction given knowledge of the prefix path traversed to get here.
+ *
+ * The size field is wider than could possibly be needed for an on-disk leaf
+ * tuple, but this allows us to form leaf tuples even when the datum is too
+ * wide to be stored immediately, and it costs nothing because of alignment
+ * considerations.
+ *
+ * Normally, nextOffset links to the next tuple belonging to the same parent
+ * node (which must be on the same page). But when the root page is a leaf
+ * page, we don't chain its tuples, so nextOffset is always 0 on the root.
+ *
+ * size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
+ * so that the tuple can be converted to REDIRECT status later. (This
+ * restriction only adds bytes for the null-datum case, otherwise alignment
+ * restrictions force it anyway.)
+ *
+ * In a leaf tuple for a NULL indexed value, there's no useful datum value;
+ * however, the SGDTSIZE limit ensures that's there's a Datum word there
+ * anyway, so SGLTDATUM can be applied safely as long as you don't do
+ * anything with the result.
+ *
+ * Minimum space to store SpGistLeafTuple on a page is 12 bytes tuple header
+ * and 4 bytes ItemIdData so 14 lower bits of nextOffset (accessed as
+ * SGLT_GET/SET_OFFSET) is enough to store actual tuple number on a page even
+ * if page size is 64Kb. Two higher bits are to store per-tuple
+ * information is there nulls mask exist and is there any included attribute
+ * of variable length type.
+ */
+
+typedef struct SpGistLeafTupleData
+{
+ unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
+ size:30; /* large enough for any palloc'able value */
+ OffsetNumber nextOffset; /* higher 1 bit = 1 if included values has
+ * nulls, 2 bit = 1 if included values contain
+ * variable length values, lower 15 bits - is
+ * "actual" nextOffset i.e. number of next
+ * tuple in chain on a page, or
+ * InvalidOffsetNumber. They SHOULD NOT be
+ * set/read directly,
+ * SGLT_SET_XXX/SGLT_GET_XXX macros must be
+ * used instead. */
+ ItemPointerData heapPtr; /* TID of represented heap tuple */
+ /* leaf datum follows */
+
+ /*
+ * if SGLT_GET_CONTAINSNULLMASK nullmask follows. Its size (number of
+ * included columns/8)+1
+ */
+ /* include attributes follow if any */
+} SpGistLeafTupleData;
+
+typedef SpGistLeafTupleData *SpGistLeafTuple;
+
+#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
+#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
+#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
+ *(Datum *) SGLTDATAPTR(x) : \
+ PointerGetDatum(SGLTDATAPTR(x)))
+/*
+ * Accessor macros to get and set actual 14-bit offset and two bit flags from/to
+ * nextOffset value.
+ */
+#define SGLT_GET_OFFSET(x) ( (x) & 0x3FFF )
+#define SGLT_GET_CONTAINSNULLMASK(x) ( (x) >> 15 )
+#define SGLT_GET_CONTAINSVARATT(x) ( ( (x) & 4000 ) >> 14 )
+#define SGLT_SET_OFFSET(x,o) ( (x) = ( (x) & 0xC000 ) | ( (o) & 0x3FFF) )
+#define SGLT_SET_CONTAINSNULLMASK(x,n) ( (x) = ( (n) << 15 ) | ( (x) & 0x3FFF ) )
+#define SGLT_SET_CONTAINSVARATT(x,v) ( (x) = ( (v) << 14 ) | ( (x) & 0xBFFF ) )
+
+#define SGLT_GET_INCLUDE_TUPSIZE(x) SGLT_GET_OFFSET(x)
+#define SGLT_SET_INCLUDE_TUPSIZE(x,o) SGLT_SET_OFFSET(x,o)
+
+extern char *SpGistFormIncludeTuple(TupleDesc tupleDescriptor, Datum *values,
+ bool *isnull, uint16 *tupdata);
+
+/*
+ * SPGiST dead tuple: declaration for examining non-live tuples
+ *
+ * The tupstate field of this struct must match those of regular inner and
+ * leaf tuples, and its size field must match a leaf tuple's.
+ * Also, the pointer field must be in the same place as a leaf tuple's heapPtr
+ * field, to satisfy some Asserts that we make when replacing a leaf tuple
+ * with a dead tuple.
+ * We don't use nextOffset, but it's needed to align the pointer field.
+ */
+
typedef struct SpGistSearchItem
{
pairingheap_node phNode; /* pairing heap node */
@@ -160,14 +253,14 @@ typedef struct SpGistSearchItem
bool isLeaf; /* SearchItem is heap item */
bool recheck; /* qual recheck is needed */
bool recheckDistances; /* distance recheck is needed */
-
+ SpGistLeafTuple leafTuple;
/* array with numberOfOrderBys entries */
double distances[FLEXIBLE_ARRAY_MEMBER];
+ /* if there are include columns SpGistLeafTupleData follow */
} SpGistSearchItem;
#define SizeOfSpGistSearchItem(n_distances) \
(offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
-
/*
* Private state of an index scan
*/
@@ -241,6 +334,7 @@ typedef struct SpGistCache
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc;
SpGistLUPCache lastUsedPages; /* local storage of last-used info */
} SpGistCache;
@@ -321,60 +415,6 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
*(Datum *) SGNTDATAPTR(x) : \
PointerGetDatum(SGNTDATAPTR(x)))
-/*
- * SPGiST leaf tuple: carries a datum and a heap tuple TID
- *
- * In the simplest case, the datum is the same as the indexed value; but
- * it could also be a suffix or some other sort of delta that permits
- * reconstruction given knowledge of the prefix path traversed to get here.
- *
- * The size field is wider than could possibly be needed for an on-disk leaf
- * tuple, but this allows us to form leaf tuples even when the datum is too
- * wide to be stored immediately, and it costs nothing because of alignment
- * considerations.
- *
- * Normally, nextOffset links to the next tuple belonging to the same parent
- * node (which must be on the same page). But when the root page is a leaf
- * page, we don't chain its tuples, so nextOffset is always 0 on the root.
- *
- * size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
- * so that the tuple can be converted to REDIRECT status later. (This
- * restriction only adds bytes for the null-datum case, otherwise alignment
- * restrictions force it anyway.)
- *
- * In a leaf tuple for a NULL indexed value, there's no useful datum value;
- * however, the SGDTSIZE limit ensures that's there's a Datum word there
- * anyway, so SGLTDATUM can be applied safely as long as you don't do
- * anything with the result.
- */
-typedef struct SpGistLeafTupleData
-{
- unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
- size:30; /* large enough for any palloc'able value */
- OffsetNumber nextOffset; /* next tuple in chain, or InvalidOffsetNumber */
- ItemPointerData heapPtr; /* TID of represented heap tuple */
- /* leaf datum follows */
-} SpGistLeafTupleData;
-
-typedef SpGistLeafTupleData *SpGistLeafTuple;
-
-#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
-#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
-#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
- *(Datum *) SGLTDATAPTR(x) : \
- PointerGetDatum(SGLTDATAPTR(x)))
-
-/*
- * SPGiST dead tuple: declaration for examining non-live tuples
- *
- * The tupstate field of this struct must match those of regular inner and
- * leaf tuples, and its size field must match a leaf tuple's.
- * Also, the pointer field must be in the same place as a leaf tuple's heapPtr
- * field, to satisfy some Asserts that we make when replacing a leaf tuple
- * with a dead tuple.
- * We don't use nextOffset, but it's needed to align the pointer field.
- * pointer and xid are only valid when tupstate = REDIRECT.
- */
typedef struct SpGistDeadTupleData
{
unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
@@ -394,7 +434,6 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
* size plus sizeof(ItemIdData) (for the line pointer). This works correctly
* so long as tuple sizes are always maxaligned.
*/
-
/* Page capacity after allowing for fixed header and special space */
#define SPGIST_PAGE_CAPACITY \
MAXALIGN_DOWN(BLCKSZ - \
@@ -456,9 +495,10 @@ extern void SpGistInitPage(Page page, uint16 f);
extern void SpGistInitBuffer(Buffer b, uint16 f);
extern void SpGistInitMetapage(Page page);
extern unsigned int SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum);
+extern unsigned int SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull);
extern SpGistLeafTuple spgFormLeafTuple(SpGistState *state,
ItemPointer heapPtr,
- Datum datum, bool isnull);
+ Datum *datum, bool *isnull);
extern SpGistNodeTuple spgFormNodeTuple(SpGistState *state,
Datum label, bool isnull);
extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
@@ -466,6 +506,8 @@ extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
int nNodes, SpGistNodeTuple *nodes);
extern SpGistDeadTuple spgFormDeadTuple(SpGistState *state, int tupstate,
BlockNumber blkno, OffsetNumber offnum);
+extern void SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state,
+ Datum *datum, bool *isnull, bool key_value_isnull);
extern Datum *spgExtractNodeLabels(SpGistState *state,
SpGistInnerTuple innerTuple);
extern OffsetNumber SpGistPageAddNewItem(SpGistState *state, Page page,
@@ -484,7 +526,7 @@ extern void spgPageIndexMultiDelete(SpGistState *state, Page page,
int firststate, int reststate,
BlockNumber blkno, OffsetNumber offnum);
extern bool spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull);
+ ItemPointer heapPtr, Datum *datum, bool *isnull);
/* spgproc.c */
extern double *spg_key_orderbys_distances(Datum key, bool isLeaf,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index d92a6d12c6..93e6a43b6d 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -169,9 +169,9 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
hash | bogus |
spgist | can_order | f
spgist | can_unique | f
- spgist | can_multi_col | f
+ spgist | can_multi_col | t
spgist | can_exclude | t
- spgist | can_include | f
+ spgist | can_include | t
spgist | bogus |
(36 rows)
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..4fd2b7e878 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -356,7 +356,6 @@ CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "brin" does not support included columns
CREATE INDEX on tbl USING gist(c3) INCLUDE (c1, c4);
CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
-ERROR: access method "spgist" does not support included columns
CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "gin" does not support included columns
CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
diff --git a/src/test/regress/expected/index_including_spgist.out b/src/test/regress/expected/index_including_spgist.out
new file mode 100644
index 0000000000..fa64766fb7
--- /dev/null
+++ b/src/test/regress/expected/index_including_spgist.out
@@ -0,0 +1,139 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+DROP TABLE tbl_spgist;
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+----------
+(0 rows)
+
+DROP TABLE tbl_spgist;
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+ Table "public.tbl_spgist"
+ Column | Type | Collation | Nullable | Default
+--------+---------+-----------+----------+---------
+ c1 | bigint | | |
+ c2 | integer | | |
+ c3 | bigint | | |
+ c4 | box | | |
+Indexes:
+ "tbl_spgist_idx" spgist (c4) INCLUDE (c1, c3)
+
+DROP TABLE tbl_spgist;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..985458a1a8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -50,7 +50,7 @@ test: copy copyselect copydml insert insert_conflict
# ----------
test: create_misc create_operator create_procedure
# These depend on create_misc and create_operator
-test: create_index create_index_spgist create_view index_including index_including_gist
+test: create_index create_index_spgist create_view index_including index_including_gist index_including_spgist
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..f3df961535 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -68,6 +68,7 @@ test: create_index_spgist
test: create_view
test: index_including
test: index_including_gist
+test: index_including_spgist
test: create_aggregate
test: create_function_3
test: create_cast
diff --git a/src/test/regress/sql/index_including_spgist.sql b/src/test/regress/sql/index_including_spgist.sql
new file mode 100644
index 0000000000..a59e73aa22
--- /dev/null
+++ b/src/test/regress/sql/index_including_spgist.sql
@@ -0,0 +1,81 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+DROP TABLE tbl_spgist;
+
--
2.28.0
вт, 11 авг. 2020 г. в 12:11, Pavel Borisov <pashkin.elfe@gmail.com>:
Show quoted text
I added changes in documentation into the patch.
--
Best regards,
Pavel BorisovPostgres Professional: http://postgrespro.com <http://www.postgrespro.com>
Attachments:
v5-0001-Covering-SP-GiST-index-support-for-INCLUDE-column.patchapplication/octet-stream; name=v5-0001-Covering-SP-GiST-index-support-for-INCLUDE-column.patchDownload
From 52d67cfbfe09ba085e4316f243e2df1a12f6f3c4 Mon Sep 17 00:00:00 2001
From: Pavel Borisov <pashkin.elfe@gmail.com>
Date: Tue, 11 Aug 2020 22:49:22 +0400
Subject: [PATCH v5] Covering SP-GiST index - support for INCLUDE columns
Adding INCLUDE colums for SPGiST index is intended to increase the speed of queries by making scan index only likewise
in btree and GiST index. These included values are added only to leaf tuples and they are not used in index tree search
but they can be fetched during index scan.
The other point of included columns is to overcome SP-GiST limitation of being single-column in principle. I.e. in
certain cases a single covering SP-GiST index can replace several separate ones with less disk space and shared buffers
consumption, faster update etc. Also there can be included any data types without SP-GiST supported opclasses.
Discussion: https://www.postgresql.org/message-id/flat/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com
---
doc/src/sgml/indices.sgml | 4 +-
doc/src/sgml/ref/create_index.sgml | 4 +-
doc/src/sgml/spgist.sgml | 8 +
src/backend/access/spgist/README | 2 +-
src/backend/access/spgist/spgdoinsert.c | 172 +++++---
src/backend/access/spgist/spginsert.c | 5 +-
src/backend/access/spgist/spgscan.c | 87 +++-
src/backend/access/spgist/spgutils.c | 384 ++++++++++++++++--
src/backend/access/spgist/spgvacuum.c | 25 +-
src/backend/access/spgist/spgxlog.c | 6 +-
src/include/access/spgist_private.h | 160 +++++---
src/test/regress/expected/amutils.out | 4 +-
src/test/regress/expected/index_including.out | 1 -
.../expected/index_including_spgist.out | 139 +++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
.../regress/sql/index_including_spgist.sql | 81 ++++
17 files changed, 900 insertions(+), 185 deletions(-)
create mode 100644 src/test/regress/expected/index_including_spgist.out
create mode 100644 src/test/regress/sql/index_including_spgist.sql
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 28adaba72d..c89cc6cb08 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1194,8 +1194,8 @@ CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);
likely to not need to access the heap. If the heap tuple must be visited
anyway, it costs nothing more to get the column's value from there.
Other restrictions are that expressions are not currently supported as
- included columns, and that only B-tree and GiST indexes currently support
- included columns.
+ included columns, and that only B-tree, GiST and SP-GiST indexes currently
+ support included columns.
</para>
<para>
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index ff87b2d28f..3d360bcf47 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -187,8 +187,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
</para>
<para>
- Currently, the B-tree and the GiST index access methods support this
- feature. In B-tree and the GiST indexes, the values of columns listed
+ Currently, the B-tree, GiST and SP-GiST index access methods support
+ this feature. In these indexes, the values of columns listed
in the <literal>INCLUDE</literal> clause are included in leaf tuples
which correspond to heap tuples, but are not included in upper-level
index entries used for tree navigation.
diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index 0e04a08679..868a140a6a 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -240,6 +240,14 @@
inner tuples that are passed through to reach the leaf level.
</para>
+ <para>
+ In case when <acronym>SP-GiST</acronym> index is created with
+ <literal>INCLUDE</literal> clause i.e. covering index, leaf tuples also
+ contain data from included columns. This data is stored uncompressed and can have
+ data types without any SP-GiST operator class.
+
+ </para>
+
<para>
Inner tuples are more complex, since they are branching points in the
search tree. Each inner tuple contains a set of one or more
diff --git a/src/backend/access/spgist/README b/src/backend/access/spgist/README
index b55b073832..87e08431fa 100644
--- a/src/backend/access/spgist/README
+++ b/src/backend/access/spgist/README
@@ -73,8 +73,8 @@ Leaf tuple consists of:
Example:
radix tree - the rest of string (postfix)
quad and k-d tree - the point itself
-
ItemPointer to the heap
+ optional included colums values
NULLS HANDLING
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 934d65b89f..4c133b7106 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -22,7 +22,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
-
+#include "access/htup_details.h"
/*
* SPPageDesc tracks all info about a page we are inserting into. In some
@@ -220,7 +220,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
SpGistBlockIsRoot(current->blkno))
{
/* Tuple is not part of a chain */
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
current->offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -253,7 +253,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
PageGetItemId(current->page, current->offnum));
if (head->tupstate == SPGIST_LIVE)
{
- leafTuple->nextOffset = head->nextOffset;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, SGLT_GET_OFFSET(head->nextOffset));
offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -264,14 +264,14 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
*/
head = (SpGistLeafTuple) PageGetItem(current->page,
PageGetItemId(current->page, current->offnum));
- head->nextOffset = offnum;
+ SGLT_SET_OFFSET(head->nextOffset, offnum);
xlrec.offnumLeaf = offnum;
xlrec.offnumHeadLeaf = current->offnum;
}
else if (head->tupstate == SPGIST_DEAD)
{
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
PageIndexTupleDelete(current->page, current->offnum);
if (PageAddItem(current->page,
(Item) leafTuple, leafTuple->size,
@@ -362,13 +362,13 @@ checkSplitConditions(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* Don't count it in result, because it won't go to other page */
}
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
*nToSplit = n;
@@ -437,7 +437,7 @@ moveLeafs(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* We don't want to move it, so don't count it in size */
toDelete[nDelete] = i;
nDelete++;
@@ -446,7 +446,7 @@ moveLeafs(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
/* Find a leaf page that will hold them */
@@ -475,7 +475,7 @@ moveLeafs(Relation index, SpGistState *state,
* don't care). We're modifying the tuple on the source page
* here, but it's okay since we're about to delete it.
*/
- it->nextOffset = r;
+ SGLT_SET_OFFSET(it->nextOffset, r);
r = SpGistPageAddNewItem(state, npage, (Item) it, it->size,
&startOffset, false);
@@ -490,7 +490,7 @@ moveLeafs(Relation index, SpGistState *state,
}
/* add the new tuple as well */
- newLeafTuple->nextOffset = r;
+ SGLT_SET_OFFSET(newLeafTuple->nextOffset, r);
r = SpGistPageAddNewItem(state, npage,
(Item) newLeafTuple, newLeafTuple->size,
&startOffset, false);
@@ -709,6 +709,9 @@ doPickSplit(Relation index, SpGistState *state,
int nToDelete,
nToInsert,
maxToInclude;
+ Datum *leafChainDatums;
+ bool *leafChainIsnulls;
+ const int natts = IndexRelationGetNumberOfAttributes(index);
in.level = level;
@@ -723,14 +726,16 @@ doPickSplit(Relation index, SpGistState *state,
toInsert = (OffsetNumber *) palloc(sizeof(OffsetNumber) * n);
newLeafs = (SpGistLeafTuple *) palloc(sizeof(SpGistLeafTuple) * n);
leafPageSelect = (uint8 *) palloc(sizeof(uint8) * n);
-
STORE_STATE(state, xlrec.stateSrc);
+ leafChainDatums = (Datum *) palloc(n * natts * sizeof(Datum));
+ leafChainIsnulls = (bool *) palloc(n * natts * sizeof(bool));
+
/*
- * Form list of leaf tuples which will be distributed as split result;
- * also, count up the amount of space that will be freed from current.
- * (Note that in the non-root case, we won't actually delete the old
- * tuples, only replace them with redirects or placeholders.)
+ * Collect leaf tuples which will be distributed as split result; also,
+ * count up the amount of space that will be freed from current. (Note
+ * that in the non-root case, we won't actually delete the old tuples,
+ * only replace them with redirects or placeholders.)
*
* Note: the SGLTDATUM calls here are safe even when dealing with a nulls
* page. For a pass-by-value data type we will fetch a word that must
@@ -738,7 +743,15 @@ doPickSplit(Relation index, SpGistState *state,
* tuples must have size at least SGDTSIZE). For a pass-by-reference type
* we are just computing a pointer that isn't going to get dereferenced.
* So it's not worth guarding the calls with isNulls checks.
+ *
+ * Datums and isnulls of all leaf tuple attributes in a chain are
+ * collected into 2-d arrays: (number of tuples in chain) x (number of
+ * attributes) First attribute is key, the other - included attributes (if
+ * any). After picksplit we need to form new leaf tuples as key attribute
+ * length can change which can affect alignment of every include
+ * attribute.
*/
+
nToInsert = 0;
nToDelete = 0;
spaceToDelete = 0;
@@ -759,6 +772,8 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+ SpGistDeformLeafTuple(it, state, leafChainDatums + nToInsert * natts,
+ leafChainIsnulls + nToInsert * natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -784,6 +799,9 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+
+ SpGistDeformLeafTuple(it, state, leafChainDatums + nToInsert * natts,
+ leafChainIsnulls + nToInsert * natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -795,7 +813,7 @@ doPickSplit(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
toDelete[nToDelete] = i;
nToDelete++;
/* replacing it with redirect will save no space */
@@ -803,7 +821,7 @@ doPickSplit(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
}
in.nTuples = nToInsert;
@@ -816,10 +834,17 @@ doPickSplit(Relation index, SpGistState *state,
*/
in.datums[in.nTuples] = SGLTDATUM(newLeafTuple, state);
heapPtrs[in.nTuples] = newLeafTuple->heapPtr;
+
+ SpGistDeformLeafTuple(newLeafTuple, state, leafChainDatums + (in.nTuples) * natts,
+ leafChainIsnulls + (in.nTuples) * natts, isNulls);
in.nTuples++;
memset(&out, 0, sizeof(out));
+ /*
+ * Process collected key values of tuples from the chain. Included values
+ * are used to build fresh leaf tuples unchanged.
+ */
if (!isNulls)
{
/*
@@ -837,9 +862,11 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- out.leafTupleDatums[i],
- false);
+ *(leafChainDatums + i * natts) = (Datum) out.leafTupleDatums[i];
+ *(leafChainIsnulls + i * natts) = false;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums + i * natts,
+ leafChainIsnulls + i * natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -860,9 +887,14 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- (Datum) 0,
- true);
+ /*
+ * Nulls tree can contain only null key values.
+ */
+ *(leafChainDatums + i * natts) = (Datum) 0;
+ *(leafChainIsnulls + i * natts) = true;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums + i * natts,
+ leafChainIsnulls + i * natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -1196,10 +1228,10 @@ doPickSplit(Relation index, SpGistState *state,
if (ItemPointerIsValid(&nodes[n]->t_tid))
{
Assert(ItemPointerGetBlockNumber(&nodes[n]->t_tid) == leafBlock);
- it->nextOffset = ItemPointerGetOffsetNumber(&nodes[n]->t_tid);
+ SGLT_SET_OFFSET(it->nextOffset, ItemPointerGetOffsetNumber(&nodes[n]->t_tid));
}
else
- it->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(it->nextOffset, InvalidOffsetNumber);
/* Insert it on page */
newoffset = SpGistPageAddNewItem(state, BufferGetPage(leafBuffer),
@@ -1889,67 +1921,83 @@ spgSplitNodeAction(Relation index, SpGistState *state,
*/
bool
spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull)
+ ItemPointer heapPtr, Datum *datum, bool *isnull)
{
int level = 0;
- Datum leafDatum;
+ Datum *leafDatum;
int leafSize;
SPPageDesc current,
parent;
FmgrInfo *procinfo = NULL;
+ int i;
/*
* Look up FmgrInfo of the user-defined choose function once, to save
* cycles in the loop below.
*/
- if (!isnull)
+ if (!isnull[0])
procinfo = index_getprocinfo(index, 1, SPGIST_CHOOSE_PROC);
/*
* Prepare the leaf datum to insert.
- *
+ */
+
+ leafDatum = (Datum *) palloc0(sizeof(Datum) * (IndexRelationGetNumberOfAttributes(index)));
+
+ /*
* If an optional "compress" method is provided, then call it to form the
- * leaf datum from the input datum. Otherwise store the input datum as
- * is. Since we don't use index_form_tuple in this AM, we have to make
- * sure value to be inserted is not toasted; FormIndexDatum doesn't
- * guarantee that. But we assume the "compress" method to return an
- * untoasted value.
+ * key datum from the input datum. Otherwise store the input datum as is.
+ * Since we don't use index_form_tuple in this AM, we have to make sure
+ * value to be inserted is not toasted; FormIndexDatum doesn't guarantee
+ * that. But we assume the "compress" method to return an untoasted
+ * value.
*/
- if (!isnull)
+ if (!isnull[0])
{
if (OidIsValid(index_getprocid(index, 1, SPGIST_COMPRESS_PROC)))
{
FmgrInfo *compressProcinfo = NULL;
compressProcinfo = index_getprocinfo(index, 1, SPGIST_COMPRESS_PROC);
- leafDatum = FunctionCall1Coll(compressProcinfo,
- index->rd_indcollation[0],
- datum);
+ leafDatum[0] = FunctionCall1Coll(compressProcinfo,
+ index->rd_indcollation[0],
+ datum[0]);
}
else
{
Assert(state->attLeafType.type == state->attType.type);
if (state->attType.attlen == -1)
- leafDatum = PointerGetDatum(PG_DETOAST_DATUM(datum));
+ leafDatum[0] = PointerGetDatum(PG_DETOAST_DATUM(datum[0]));
else
- leafDatum = datum;
+ leafDatum[0] = datum[0];
}
}
else
- leafDatum = (Datum) 0;
+ leafDatum[0] = (Datum) 0;
+
+ for (i = 1; i < IndexRelationGetNumberOfAttributes(index); i++)
+ {
+ if (!isnull[i])
+ {
+ if (TupleDescAttr(state->includeTupdesc, i - 1)->attlen == -1)
+ leafDatum[i] = PointerGetDatum(PG_DETOAST_DATUM(datum[i]));
+ else
+ leafDatum[i] = datum[i];
+ }
+ else
+ leafDatum[i] = (Datum) 0;
+ }
+
/*
- * Compute space needed for a leaf tuple containing the given datum.
+ * Compute space needed on a page for a leaf tuple containing the given
+ * datum.
*
* If it isn't gonna fit, and the opclass can't reduce the datum size by
* suffixing, bail out now rather than getting into an endless loop.
*/
- if (!isnull)
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
- else
- leafSize = SGDTSIZE + sizeof(ItemIdData);
+ leafSize = SpgLeafSize(state, leafDatum, isnull) + sizeof(ItemIdData);
if (leafSize > SPGIST_PAGE_CAPACITY && !state->config.longValuesOK)
ereport(ERROR,
@@ -1961,7 +2009,7 @@ spgdoinsert(Relation index, SpGistState *state,
errhint("Values larger than a buffer page cannot be indexed.")));
/* Initialize "current" to the appropriate root page */
- current.blkno = isnull ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
+ current.blkno = isnull[0] ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
current.buffer = InvalidBuffer;
current.page = NULL;
current.offnum = FirstOffsetNumber;
@@ -1995,7 +2043,7 @@ spgdoinsert(Relation index, SpGistState *state,
*/
current.buffer =
SpGistGetBuffer(index,
- GBUF_LEAF | (isnull ? GBUF_NULLS : 0),
+ GBUF_LEAF | (isnull[0] ? GBUF_NULLS : 0),
Min(leafSize, SPGIST_PAGE_CAPACITY),
&isNew);
current.blkno = BufferGetBlockNumber(current.buffer);
@@ -2037,7 +2085,7 @@ spgdoinsert(Relation index, SpGistState *state,
current.page = BufferGetPage(current.buffer);
/* should not arrive at a page of the wrong type */
- if (isnull ? !SpGistPageStoresNulls(current.page) :
+ if (isnull[0] ? !SpGistPageStoresNulls(current.page) :
SpGistPageStoresNulls(current.page))
elog(ERROR, "SPGiST index page %u has wrong nulls flag",
current.blkno);
@@ -2054,7 +2102,7 @@ spgdoinsert(Relation index, SpGistState *state,
{
/* it fits on page, so insert it and we're done */
addLeafTuple(index, state, leafTuple,
- ¤t, &parent, isnull, isNew);
+ ¤t, &parent, isnull[0], isNew);
break;
}
else if ((sizeToSplit =
@@ -2068,14 +2116,14 @@ spgdoinsert(Relation index, SpGistState *state,
* chain to another leaf page rather than splitting it.
*/
Assert(!isNew);
- moveLeafs(index, state, ¤t, &parent, leafTuple, isnull);
+ moveLeafs(index, state, ¤t, &parent, leafTuple, isnull[0]);
break; /* we're done */
}
else
{
/* picksplit */
if (doPickSplit(index, state, ¤t, &parent,
- leafTuple, level, isnull, isNew))
+ leafTuple, level, isnull[0], isNew))
break; /* doPickSplit installed new tuples */
/* leaf tuple will not be inserted yet */
@@ -2110,8 +2158,8 @@ spgdoinsert(Relation index, SpGistState *state,
innerTuple = (SpGistInnerTuple) PageGetItem(current.page,
PageGetItemId(current.page, current.offnum));
- in.datum = datum;
- in.leafDatum = leafDatum;
+ in.datum = datum[0];
+ in.leafDatum = leafDatum[0];
in.level = level;
in.allTheSame = innerTuple->allTheSame;
in.hasPrefix = (innerTuple->prefixSize > 0);
@@ -2121,7 +2169,7 @@ spgdoinsert(Relation index, SpGistState *state,
memset(&out, 0, sizeof(out));
- if (!isnull)
+ if (!isnull[0])
{
/* use user-defined choose method */
FunctionCall2Coll(procinfo,
@@ -2158,11 +2206,11 @@ spgdoinsert(Relation index, SpGistState *state,
/* Adjust level as per opclass request */
level += out.result.matchNode.levelAdd;
/* Replace leafDatum and recompute leafSize */
- if (!isnull)
+ if (!isnull[0])
{
- leafDatum = out.result.matchNode.restDatum;
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
+ leafDatum[0] = out.result.matchNode.restDatum;
+ leafSize = SpgLeafSize(state, leafDatum, isnull) +
+ sizeof(ItemIdData);
}
/*
@@ -2227,6 +2275,6 @@ spgdoinsert(Relation index, SpGistState *state,
SpGistSetLastUsedPage(index, parent.buffer);
UnlockReleaseBuffer(parent.buffer);
}
-
+ pfree(leafDatum);
return true;
}
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4508a2b92..b54ae85f6e 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -55,8 +55,7 @@ spgistBuildCallback(Relation index, ItemPointer tid, Datum *values,
* lock on some buffer. So we need to be willing to retry. We can flush
* any temp data when retrying.
*/
- while (!spgdoinsert(index, &buildstate->spgstate, tid,
- *values, *isnull))
+ while (!spgdoinsert(index, &buildstate->spgstate, tid, values, isnull))
{
MemoryContextReset(buildstate->tmpCtx);
}
@@ -226,7 +225,7 @@ spginsert(Relation index, Datum *values, bool *isnull,
* to avoid cumulative memory consumption. That means we also have to
* redo initSpGistState(), but it's cheap enough not to matter.
*/
- while (!spgdoinsert(index, &spgstate, ht_ctid, *values, *isnull))
+ while (!spgdoinsert(index, &spgstate, ht_ctid, values, isnull))
{
MemoryContextReset(insertCtx);
initSpGistState(&spgstate, index);
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 4d506bfb9a..5a3c7c50cf 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -28,7 +28,8 @@
typedef void (*storeRes_func) (SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isNull, bool recheck,
- bool recheckDistances, double *distances);
+ bool recheckDistances, double *distances,
+ SpGistLeafTuple leafTuple);
/*
* Pairing heap comparison function for the SpGistSearchItem queue.
@@ -88,6 +89,9 @@ spgFreeSearchItem(SpGistScanOpaque so, SpGistSearchItem *item)
if (item->traversalValue)
pfree(item->traversalValue);
+ if (item->isLeaf && item->leafTuple)
+ pfree(item->leafTuple);
+
pfree(item);
}
@@ -134,6 +138,8 @@ spgAddStartItem(SpGistScanOpaque so, bool isnull)
startEntry->recheck = false;
startEntry->recheckDistances = false;
+ startEntry->leafTuple = NULL;
+
spgAddSearchItemToQueue(so, startEntry);
}
@@ -438,14 +444,30 @@ spgendscan(IndexScanDesc scan)
* Leaf SpGistSearchItem constructor, called in queue context
*/
static SpGistSearchItem *
-spgNewHeapItem(SpGistScanOpaque so, int level, ItemPointer heapPtr,
+spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
Datum leafValue, bool recheck, bool recheckDistances,
bool isnull, double *distances)
{
SpGistSearchItem *item = spgAllocSearchItem(so, isnull, distances);
+ /*
+ * If there are include attributes search item in the queue should contain
+ * them.
+ */
+ if (so->state.includeTupdesc)
+ {
+ Assert(so->state.includeTupdesc->natts);
+
+ item->leafTuple = palloc(leafTuple->size);
+ memcpy(item->leafTuple, leafTuple, leafTuple->size);
+ }
+ else
+ {
+ item->leafTuple = NULL;
+ }
+
item->level = level;
- item->heapPtr = *heapPtr;
+ item->heapPtr = leafTuple->heapPtr;
/* copy value to queue cxt out of tmp cxt */
item->value = isnull ? (Datum) 0 :
datumCopy(leafValue, so->state.attLeafType.attbyval,
@@ -503,6 +525,8 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
in.returnData = so->want_itup;
in.leafDatum = SGLTDATUM(leafTuple, &so->state);
+
+
out.leafValue = (Datum) 0;
out.recheck = false;
out.distances = NULL;
@@ -528,7 +552,7 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
/* the scan is ordered -> add the item to the queue */
MemoryContext oldCxt = MemoryContextSwitchTo(so->traversalCxt);
SpGistSearchItem *heapItem = spgNewHeapItem(so, item->level,
- &leafTuple->heapPtr,
+ leafTuple,
leafValue,
recheck,
recheckDistances,
@@ -543,8 +567,10 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
{
/* non-ordered scan, so report the item right away */
Assert(!recheckDistances);
+
storeRes(so, &leafTuple->heapPtr, leafValue, isnull,
- recheck, false, NULL);
+ recheck, false, NULL, leafTuple);
+
*reportedSome = true;
}
}
@@ -736,7 +762,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
/* dead tuple should be first in chain */
Assert(offset == ItemPointerGetOffsetNumber(&item->heapPtr));
/* No live entries on this page */
- Assert(leafTuple->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(leafTuple->nextOffset) == InvalidOffsetNumber);
return SpGistBreakOffsetNumber;
}
}
@@ -750,7 +776,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes);
- return leafTuple->nextOffset;
+ return SGLT_GET_OFFSET(leafTuple->nextOffset);
}
/*
@@ -782,8 +808,8 @@ redirect:
{
/* We store heap items in the queue only in case of ordered search */
Assert(so->numberOfNonNullOrderBys > 0);
- storeRes(so, &item->heapPtr, item->value, item->isNull,
- item->recheck, item->recheckDistances, item->distances);
+ storeRes(so, &item->heapPtr, item->value, item->isNull, item->recheck,
+ item->recheckDistances, item->distances, item->leafTuple);
reportedSome = true;
}
else
@@ -877,7 +903,7 @@ redirect:
static void
storeBitmap(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *distances)
+ double *distances, SpGistLeafTuple leafTuple)
{
Assert(!recheckDistances && !distances);
tbm_add_tuples(so->tbm, heapPtr, 1, recheck);
@@ -904,7 +930,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
static void
storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *nonNullDistances)
+ double *nonNullDistances, SpGistLeafTuple leafTuple)
{
Assert(so->nPtrs < MaxIndexTuplesPerPage);
so->heapPtrs[so->nPtrs] = *heapPtr;
@@ -949,9 +975,38 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
* Reconstruct index data. We have to copy the datum out of the temp
* context anyway, so we may as well create the tuple here.
*/
- so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
- &leafValue,
- &isnull);
+ if (so->state.includeTupdesc)
+ {
+ /* Add included attributes */
+ Datum *leafDatums;
+ bool *leafIsnulls;
+
+ Assert(so->state.includeTupdesc->natts);
+
+ leafDatums = (Datum *) palloc(sizeof(Datum) * (so->state.includeTupdesc->natts + 1));
+ leafIsnulls = (bool *) palloc(sizeof(bool) * (so->state.includeTupdesc->natts + 1));
+
+ SpGistDeformLeafTuple(leafTuple, &so->state, leafDatums, leafIsnulls, isnull);
+
+ /*
+ * override key value extracted from LeafTuple in case we've
+ * reconstructed it already
+ */
+ leafDatums[0] = leafValue;
+ leafIsnulls[0] = isnull;
+
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ leafDatums,
+ leafIsnulls);
+ pfree(leafDatums);
+ pfree(leafIsnulls);
+ }
+ else
+ {
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ &leafValue,
+ &isnull);
+ }
}
so->nPtrs++;
}
@@ -1019,6 +1074,10 @@ spgcanreturn(Relation index, int attno)
{
SpGistCache *cache;
+ /* Included attributes always can be fetched for index-only scans */
+ if (attno > 1)
+ return true;
+
/* We can do it if the opclass config function says so */
cache = spgGetCache(index);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 0efe05e552..93c99fca4b 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -31,7 +31,18 @@
#include "utils/index_selfuncs.h"
#include "utils/lsyscache.h"
#include "utils/syscache.h"
+#include "access/itup.h"
+#include "access/detoast.h"
+#include "access/toast_internals.h"
+#include "access/heaptoast.h"
+#include "utils/expandeddatum.h"
+/* Does att's datatype allow packing into the 1-byte-header varlena format? */
+#define ATT_IS_PACKABLE(att) \
+ ((att)->attlen == -1 && (att)->attstorage != TYPSTORAGE_PLAIN)
+
+Size spgIncludedDataSize(TupleDesc tupleDesc, Datum *values,
+ bool *isnull, Size start);
/*
* SP-GiST handler function: return IndexAmRoutine with access method parameters
@@ -49,7 +60,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amcanorderbyop = true;
amroutine->amcanbackward = false;
amroutine->amcanunique = false;
- amroutine->amcanmulticol = false;
+ amroutine->amcanmulticol = true;
amroutine->amoptionalkey = true;
amroutine->amsearcharray = false;
amroutine->amsearchnulls = true;
@@ -57,7 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amclusterable = false;
amroutine->ampredlocks = false;
amroutine->amcanparallel = false;
- amroutine->amcaninclude = false;
+ amroutine->amcaninclude = true;
amroutine->amusemaintenanceworkmem = false;
amroutine->amparallelvacuumoptions =
VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
@@ -116,14 +127,21 @@ spgGetCache(Relation index)
cache = MemoryContextAllocZero(index->rd_indexcxt,
sizeof(SpGistCache));
- /* SPGiST doesn't support multi-column indexes */
- Assert(index->rd_att->natts == 1);
+ /*
+ * SPGiST should have one key column and can also have included
+ * columns
+ */
+ if (IndexRelationGetNumberOfKeyAttributes(index) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("SPGiST index can have only one key column")));
/*
- * Get the actual data type of the indexed column from the index
- * tupdesc. We pass this to the opclass config function so that
- * polymorphic opclasses are possible.
+ * Get the actual data type of the key column from the index tupdesc.
+ * We pass this to the opclass config function so that polymorphic
+ * opclasses are possible.
*/
+
atttype = TupleDescAttr(index->rd_att, 0)->atttypid;
/* Call the config function to get config info for the opclass */
@@ -156,6 +174,7 @@ spgGetCache(Relation index)
fillTypeDesc(&cache->attPrefixType, cache->config.prefixType);
fillTypeDesc(&cache->attLabelType, cache->config.labelType);
+
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
@@ -177,7 +196,25 @@ spgGetCache(Relation index)
/* assume it's up to date */
cache = (SpGistCache *) index->rd_amcache;
}
+ /* Form descriptor for included columns if any */
+ if (IndexRelationGetNumberOfAttributes(index) > 1)
+ {
+ int i;
+
+ cache->includeTupdesc = CreateTemplateTupleDesc(
+ IndexRelationGetNumberOfAttributes(index) - 1);
+
+ for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; i++)
+ {
+ TupleDescInitEntry(cache->includeTupdesc, i + 1, NULL,
+ TupleDescAttr(index->rd_att, i + 1)->atttypid,
+ -1, 0);
+ TupleDescAttr(cache->includeTupdesc, i)->attstorage = TYPSTORAGE_PLAIN;
+ }
+ }
+ else
+ cache->includeTupdesc = NULL;
return cache;
}
@@ -190,6 +227,7 @@ initSpGistState(SpGistState *state, Relation index)
/* Get cached static information about index */
cache = spgGetCache(index);
+ state->includeTupdesc = cache->includeTupdesc;
state->config = cache->config;
state->attType = cache->attType;
state->attLeafType = cache->attLeafType;
@@ -603,7 +641,7 @@ spgoptions(Datum reloptions, bool validate)
/*
* Get the space needed to store a non-null datum of the indicated type.
- * Note the result is already rounded up to a MAXALIGN boundary.
+ * Note the result is not maxaligned and this should be done by caller if needed.
* Also, we follow the SPGiST convention that pass-by-val types are
* just stored in their Datum representation (compare memcpyDatum).
*/
@@ -619,7 +657,7 @@ SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum)
else
size = VARSIZE_ANY(datum);
- return MAXALIGN(size);
+ return size;
}
/*
@@ -642,36 +680,202 @@ memcpyDatum(void *target, SpGistTypeDesc *att, Datum datum)
}
/*
- * Construct a leaf tuple containing the given heap TID and datum value
+ * Private version of heap_compute_data_size with start address not
+ * necessarily MAXALIGNed. The reason is that start address (and alignment)
+ * influence alignment of each of next values and overall size of included
+ * data area in SpGiST leaf tuple.
+ */
+Size
+spgIncludedDataSize(TupleDesc tupleDesc,
+ Datum *values,
+ bool *isnull, Size start)
+{
+ Size data_length = 0;
+ int i;
+ int numberOfAttributes = tupleDesc->natts;
+
+ data_length = start;
+ for (i = 0; i < numberOfAttributes; i++)
+ {
+ Datum val;
+ Form_pg_attribute atti;
+
+ if (isnull[i])
+ continue;
+
+ val = values[i];
+ atti = TupleDescAttr(tupleDesc, i);
+
+ if (ATT_IS_PACKABLE(atti) &&
+ VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
+ {
+ /*
+ * we're anticipating converting to a short varlena header, so
+ * adjust length and don't count any alignment
+ */
+ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
+ }
+ else if (atti->attlen == -1 &&
+ VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+ {
+ /*
+ * we want to flatten the expanded value so that the constructed
+ * tuple doesn't depend on it
+ */
+ data_length = att_align_nominal(data_length, atti->attalign);
+ data_length += EOH_get_flat_size(DatumGetEOHP(val));
+ }
+ else
+ {
+ data_length = att_align_datum(data_length, atti->attalign,
+ atti->attlen, val);
+ data_length = att_addlength_datum(data_length, atti->attlen,
+ val);
+ }
+ }
+ return data_length - start;
+}
+
+/* Calculate overall leaf tuple size. SGLTHDRSZ is MAXALIGNed only for backward
+ * compatibility and there might be gap between header and key data. After key
+ * data there are no such gaps more than is is necessary for each value
+ * alignment. Overall result is MAXALIGNed.*/
+unsigned int
+SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull)
+{
+ /* compute space needed, nullmask size and offset for include attributes */
+ unsigned int size = SGLTHDRSZ;
+ unsigned int i;
+
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+ /* nullmask size */
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ size += (state->includeTupdesc->natts / 8) + 1;
+ break;
+ }
+ }
+ /* overall included attributes size each with added proper alignment. */
+ size += spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ }
+ return MAXALIGN(size);
+}
+
+/*
+ * Construct a leaf tuple containing the given heap TID, key data and included
+ * columns data. Key data starts from MAXALIGN boundary for backward compatibility.
+ * Nullmask apply only to included attributes and is placed just after key data if
+ * there is at least one NULL among included attributes. It doesn't need alignment.
+ * Then all included columns data follow aligned by their typealign's.
*/
SpGistLeafTuple
spgFormLeafTuple(SpGistState *state, ItemPointer heapPtr,
- Datum datum, bool isnull)
+ Datum *datum, bool *isnull)
{
SpGistLeafTuple tup;
- unsigned int size;
+ unsigned int size = SGLTHDRSZ;
+ unsigned int include_offset = 0;
+ unsigned int nullmask_size = 0;
+ unsigned int data_offset = 0;
+ unsigned int data_size = 0;
+ uint16 tupmask = 0;
+ int i;
- /* compute space needed (note result is already maxaligned) */
- size = SGLTHDRSZ;
- if (!isnull)
- size += SpGistGetTypeSize(&state->attLeafType, datum);
+ /*
+ * Calculate space needed. If there are include attributes also calculate
+ * sizes and offsets needed for heap_fill_tuple
+ */
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = size;
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ nullmask_size = (state->includeTupdesc->natts / 8) + 1;
+ size += nullmask_size;
+ break;
+ }
+ }
+
+ /*
+ * Alignment of all included attributes is counted inside data_size.
+ * data_offset itself is not aligned.
+ */
+ data_size = spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ data_offset = size;
+
+ size += data_size;
+ }
/*
* Ensure that we can replace the tuple with a dead tuple later. This
- * test is unnecessary when !isnull, but let's be safe.
+ * test is unnecessary when !isnull[0], but let's be safe.
*/
if (size < SGDTSIZE)
size = SGDTSIZE;
/* OK, form the tuple */
- tup = (SpGistLeafTuple) palloc0(size);
+ tup = (SpGistLeafTuple) palloc0(MAXALIGN(size));
- tup->size = size;
- tup->nextOffset = InvalidOffsetNumber;
+ tup->size = MAXALIGN(size);
+ SGLT_SET_OFFSET(tup->nextOffset, InvalidOffsetNumber);
tup->heapPtr = *heapPtr;
- if (!isnull)
- memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum);
+ if (!isnull[0])
+ memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum[0]);
+
+ /* Add included columns data to leaf tuple if any. */
+ if (state->includeTupdesc)
+ {
+ /*
+ * The start of include attributes tuple is not aligned by default.
+ * All values alignment should be done by heap_fill_tuple
+ * automaticaly. If there is a nulls mask it is included just after
+ * key attribute data and it should not be aligned.
+ */
+ heap_fill_tuple(state->includeTupdesc, datum + 1, isnull + 1,
+ (char *) tup + data_offset,
+ data_size, &tupmask,
+ (nullmask_size ? (bits8 *) tup + include_offset : NULL));
+
+ if (nullmask_size)
+ SGLT_SET_CONTAINSNULLMASK(tup->nextOffset, 1);
+
+ /*
+ * We do this because heap_fill_tuple wants to initialize a "tupmask"
+ * which is used for HeapTuples, but the only relevant info is the
+ * "has variable attributes" field. We have already set the hasnull
+ * bit above.
+ */
+ if (tupmask & HEAP_HASVARWIDTH)
+ SGLT_SET_CONTAINSVARATT(tup->nextOffset, 1);
+ }
return tup;
}
@@ -688,10 +892,10 @@ spgFormNodeTuple(SpGistState *state, Datum label, bool isnull)
unsigned int size;
unsigned short infomask = 0;
- /* compute space needed (note result is already maxaligned) */
+ /* compute space needed */
size = SGNTHDRSZ;
if (!isnull)
- size += SpGistGetTypeSize(&state->attLabelType, label);
+ size += MAXALIGN(SpGistGetTypeSize(&state->attLabelType, label));
/*
* Here we make sure that the size will fit in the field reserved for it
@@ -735,7 +939,7 @@ spgFormInnerTuple(SpGistState *state, bool hasPrefix, Datum prefix,
/* Compute size needed */
if (hasPrefix)
- prefixSize = SpGistGetTypeSize(&state->attPrefixType, prefix);
+ prefixSize = MAXALIGN(SpGistGetTypeSize(&state->attPrefixType, prefix));
else
prefixSize = 0;
@@ -1046,3 +1250,133 @@ spgproperty(Oid index_oid, int attno,
return true;
}
+
+/*
+ * Convert an SpGist tuple into palloc'd Datum/isnull arrays.
+ *
+ */
+void
+SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state, Datum *datum, bool *isnull,
+ bool key_isnull)
+{
+ unsigned int include_offset; /* offset of include data */
+ int off;
+ bits8 *nullmask_ptr = NULL; /* ptr to null bitmap in tuple */
+ char *tp;
+ bool slow = false; /* can we use/set attcacheoff? */
+ int i;
+
+ if (key_isnull)
+ {
+ datum[0] = (Datum) 0;
+ isnull[0] = true;
+ }
+ else
+ {
+ datum[0] = SGLTDATUM(tup, state);
+ isnull[0] = false;
+ }
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = key_isnull ? SGLTHDRSZ : SGLTHDRSZ + SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ tp = (char *) tup;
+ off = include_offset;
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ nullmask_ptr = (bits8 *) tp + include_offset;
+ off += (state->includeTupdesc->natts) / 8 + 1;
+ }
+
+ if (state->attLeafType.attlen > 0 && !SGLT_GET_CONTAINSVARATT(tup->nextOffset) &&
+ !SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ /* can use attcacheoff for all attributes */
+ {
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ isnull[i] = false;
+ if (thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else
+ {
+ off = att_align_nominal(off, thisatt->attalign);
+ thisatt->attcacheoff = off;
+ }
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+ }
+ }
+ else
+
+ /*
+ * general case: can use cache until first null or varlen
+ * attribute
+ */
+ {
+ if (state->attLeafType.attlen <= 0)
+ slow = true; /* can't use attcacheoff at all */
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ if (att_isnull(i - 1, nullmask_ptr))
+ {
+ datum[i] = (Datum) 0;
+ isnull[i] = true;
+ slow = true; /* can't use attcacheoff anymore */
+ continue;
+ }
+ }
+
+ isnull[i] = false;
+
+ if (!slow && thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else if (thisatt->attlen == -1)
+ {
+ /*
+ * We can only cache the offset for a varlena attribute if
+ * the offset is already suitably aligned, so that there
+ * would be no pad bytes in any case: then the offset will
+ * be valid for either an aligned or unaligned value.
+ */
+ if (!slow && off == att_align_nominal(off, thisatt->attalign))
+ thisatt->attcacheoff = off;
+ else
+ {
+ off = att_align_pointer(off, thisatt->attalign, -1, tp + off);
+ slow = true;
+ }
+ }
+ else
+ {
+ /* not varlena, so safe to use att_align_nominal */
+ off = att_align_nominal(off, thisatt->attalign);
+
+ if (!slow)
+ thisatt->attcacheoff = off;
+ }
+
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+
+ if (thisatt->attlen <= 0)
+ slow = true; /* can't use attcacheoff anymore */
+ }
+ }
+ }
+}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index bd98707f3c..a0d76901fc 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -168,23 +168,28 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
/* Form predecessor map, too */
- if (lt->nextOffset != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) != InvalidOffsetNumber)
{
/* paranoia about corrupted chain links */
- if (lt->nextOffset < FirstOffsetNumber ||
- lt->nextOffset > max ||
- predecessor[lt->nextOffset] != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) < FirstOffsetNumber ||
+ SGLT_GET_OFFSET(lt->nextOffset) > max ||
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] != InvalidOffsetNumber)
elog(ERROR, "inconsistent tuple chain links in page %u of index \"%s\"",
BufferGetBlockNumber(buffer),
RelationGetRelationName(index));
- predecessor[lt->nextOffset] = i;
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] = i;
}
}
else if (lt->tupstate == SPGIST_REDIRECT)
{
SpGistDeadTuple dt = (SpGistDeadTuple) lt;
- Assert(dt->nextOffset == InvalidOffsetNumber);
+ /*
+ * Dead tuple nextOffset is allowed to have any values of two
+ * highest bits in case it is inherited from SpGistLeafTuple where
+ * these bits has their own meaning.
+ */
+ Assert(SGLT_GET_OFFSET(dt->nextOffset) == InvalidOffsetNumber);
Assert(ItemPointerIsValid(&dt->pointer));
/*
@@ -201,7 +206,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
else
{
- Assert(lt->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(lt->nextOffset) == InvalidOffsetNumber);
}
}
@@ -250,7 +255,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
prevLive = deletable[i] ? InvalidOffsetNumber : i;
/* scan down the chain ... */
- j = head->nextOffset;
+ j = SGLT_GET_OFFSET(head->nextOffset);
while (j != InvalidOffsetNumber)
{
SpGistLeafTuple lt;
@@ -301,7 +306,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
interveningDeletable = false;
}
- j = lt->nextOffset;
+ j = SGLT_GET_OFFSET(lt->nextOffset);
}
if (prevLive == InvalidOffsetNumber)
@@ -366,7 +371,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 7be2291d07..4022e3af07 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -122,8 +122,8 @@ spgRedoAddLeaf(XLogReaderState *record)
head = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, xldata->offnumHeadLeaf));
- Assert(head->nextOffset == leafTupleHdr.nextOffset);
- head->nextOffset = xldata->offnumLeaf;
+ Assert(SGLT_GET_OFFSET(head->nextOffset) == SGLT_GET_OFFSET(leafTupleHdr.nextOffset));
+ SGLT_SET_OFFSET(head->nextOffset, xldata->offnumLeaf);
}
}
else
@@ -822,7 +822,7 @@ spgRedoVacuumLeaf(XLogReaderState *record)
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
PageSetLSN(page, lsn);
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index 00b98ec6a0..8d03adb8f5 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -141,6 +141,7 @@ typedef struct SpGistState
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc; /* tuple descriptor of included columns */
char *deadTupleStorage; /* workspace for spgFormDeadTuple */
@@ -148,6 +149,98 @@ typedef struct SpGistState
bool isBuild; /* true if doing index build */
} SpGistState;
+/*
+ * SPGiST leaf tuple: carries a datum and a heap tuple TID
+ *
+ * In the simplest case, the datum is the same as the indexed value; but
+ * it could also be a suffix or some other sort of delta that permits
+ * reconstruction given knowledge of the prefix path traversed to get here.
+ *
+ * The size field is wider than could possibly be needed for an on-disk leaf
+ * tuple, but this allows us to form leaf tuples even when the datum is too
+ * wide to be stored immediately, and it costs nothing because of alignment
+ * considerations.
+ *
+ * Normally, nextOffset links to the next tuple belonging to the same parent
+ * node (which must be on the same page). But when the root page is a leaf
+ * page, we don't chain its tuples, so nextOffset is always 0 on the root.
+ *
+ * size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
+ * so that the tuple can be converted to REDIRECT status later. (This
+ * restriction only adds bytes for the null-datum case, otherwise alignment
+ * restrictions force it anyway.)
+ *
+ * In a leaf tuple for a NULL indexed value, there's no useful datum value;
+ * however, the SGDTSIZE limit ensures that's there's a Datum word there
+ * anyway, so SGLTDATUM can be applied safely as long as you don't do
+ * anything with the result.
+ *
+ * Minimum space to store SpGistLeafTuple on a page is 12 bytes tuple header
+ * and 4 bytes ItemIdData so 14 lower bits of nextOffset (accessed as
+ * SGLT_GET/SET_OFFSET) is enough to store actual tuple number on a page even
+ * if page size is 64Kb. Two higher bits are to store per-tuple
+ * information is there nulls mask exist and is there any included attribute
+ * of variable length type.
+ */
+
+typedef struct SpGistLeafTupleData
+{
+ unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
+ size:30; /* large enough for any palloc'able value */
+ OffsetNumber nextOffset; /* higher 1 bit = 1 if included values has
+ * nulls, 2 bit = 1 if included values contain
+ * variable length values, lower 15 bits - is
+ * "actual" nextOffset i.e. number of next
+ * tuple in chain on a page, or
+ * InvalidOffsetNumber. They SHOULD NOT be
+ * set/read directly,
+ * SGLT_SET_XXX/SGLT_GET_XXX macros must be
+ * used instead. */
+ ItemPointerData heapPtr; /* TID of represented heap tuple */
+ /* leaf datum follows */
+
+ /*
+ * if SGLT_GET_CONTAINSNULLMASK nullmask follows. Its size (number of
+ * included columns/8)+1
+ */
+ /* include attributes follow if any */
+} SpGistLeafTupleData;
+
+typedef SpGistLeafTupleData *SpGistLeafTuple;
+
+#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
+#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
+#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
+ *(Datum *) SGLTDATAPTR(x) : \
+ PointerGetDatum(SGLTDATAPTR(x)))
+/*
+ * Accessor macros to get and set actual 14-bit offset and two bit flags from/to
+ * nextOffset value.
+ */
+#define SGLT_GET_OFFSET(x) ( (x) & 0x3FFF )
+#define SGLT_GET_CONTAINSNULLMASK(x) ( (x) >> 15 )
+#define SGLT_GET_CONTAINSVARATT(x) ( ( (x) & 4000 ) >> 14 )
+#define SGLT_SET_OFFSET(x,o) ( (x) = ( (x) & 0xC000 ) | ( (o) & 0x3FFF) )
+#define SGLT_SET_CONTAINSNULLMASK(x,n) ( (x) = ( (n) << 15 ) | ( (x) & 0x3FFF ) )
+#define SGLT_SET_CONTAINSVARATT(x,v) ( (x) = ( (v) << 14 ) | ( (x) & 0xBFFF ) )
+
+#define SGLT_GET_INCLUDE_TUPSIZE(x) SGLT_GET_OFFSET(x)
+#define SGLT_SET_INCLUDE_TUPSIZE(x,o) SGLT_SET_OFFSET(x,o)
+
+extern char *SpGistFormIncludeTuple(TupleDesc tupleDescriptor, Datum *values,
+ bool *isnull, uint16 *tupdata);
+
+/*
+ * SPGiST dead tuple: declaration for examining non-live tuples
+ *
+ * The tupstate field of this struct must match those of regular inner and
+ * leaf tuples, and its size field must match a leaf tuple's.
+ * Also, the pointer field must be in the same place as a leaf tuple's heapPtr
+ * field, to satisfy some Asserts that we make when replacing a leaf tuple
+ * with a dead tuple.
+ * We don't use nextOffset, but it's needed to align the pointer field.
+ */
+
typedef struct SpGistSearchItem
{
pairingheap_node phNode; /* pairing heap node */
@@ -160,14 +253,14 @@ typedef struct SpGistSearchItem
bool isLeaf; /* SearchItem is heap item */
bool recheck; /* qual recheck is needed */
bool recheckDistances; /* distance recheck is needed */
-
+ SpGistLeafTuple leafTuple;
/* array with numberOfOrderBys entries */
double distances[FLEXIBLE_ARRAY_MEMBER];
+ /* if there are include columns SpGistLeafTupleData follow */
} SpGistSearchItem;
#define SizeOfSpGistSearchItem(n_distances) \
(offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
-
/*
* Private state of an index scan
*/
@@ -241,6 +334,7 @@ typedef struct SpGistCache
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc;
SpGistLUPCache lastUsedPages; /* local storage of last-used info */
} SpGistCache;
@@ -321,60 +415,6 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
*(Datum *) SGNTDATAPTR(x) : \
PointerGetDatum(SGNTDATAPTR(x)))
-/*
- * SPGiST leaf tuple: carries a datum and a heap tuple TID
- *
- * In the simplest case, the datum is the same as the indexed value; but
- * it could also be a suffix or some other sort of delta that permits
- * reconstruction given knowledge of the prefix path traversed to get here.
- *
- * The size field is wider than could possibly be needed for an on-disk leaf
- * tuple, but this allows us to form leaf tuples even when the datum is too
- * wide to be stored immediately, and it costs nothing because of alignment
- * considerations.
- *
- * Normally, nextOffset links to the next tuple belonging to the same parent
- * node (which must be on the same page). But when the root page is a leaf
- * page, we don't chain its tuples, so nextOffset is always 0 on the root.
- *
- * size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
- * so that the tuple can be converted to REDIRECT status later. (This
- * restriction only adds bytes for the null-datum case, otherwise alignment
- * restrictions force it anyway.)
- *
- * In a leaf tuple for a NULL indexed value, there's no useful datum value;
- * however, the SGDTSIZE limit ensures that's there's a Datum word there
- * anyway, so SGLTDATUM can be applied safely as long as you don't do
- * anything with the result.
- */
-typedef struct SpGistLeafTupleData
-{
- unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
- size:30; /* large enough for any palloc'able value */
- OffsetNumber nextOffset; /* next tuple in chain, or InvalidOffsetNumber */
- ItemPointerData heapPtr; /* TID of represented heap tuple */
- /* leaf datum follows */
-} SpGistLeafTupleData;
-
-typedef SpGistLeafTupleData *SpGistLeafTuple;
-
-#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
-#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
-#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
- *(Datum *) SGLTDATAPTR(x) : \
- PointerGetDatum(SGLTDATAPTR(x)))
-
-/*
- * SPGiST dead tuple: declaration for examining non-live tuples
- *
- * The tupstate field of this struct must match those of regular inner and
- * leaf tuples, and its size field must match a leaf tuple's.
- * Also, the pointer field must be in the same place as a leaf tuple's heapPtr
- * field, to satisfy some Asserts that we make when replacing a leaf tuple
- * with a dead tuple.
- * We don't use nextOffset, but it's needed to align the pointer field.
- * pointer and xid are only valid when tupstate = REDIRECT.
- */
typedef struct SpGistDeadTupleData
{
unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
@@ -394,7 +434,6 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
* size plus sizeof(ItemIdData) (for the line pointer). This works correctly
* so long as tuple sizes are always maxaligned.
*/
-
/* Page capacity after allowing for fixed header and special space */
#define SPGIST_PAGE_CAPACITY \
MAXALIGN_DOWN(BLCKSZ - \
@@ -456,9 +495,10 @@ extern void SpGistInitPage(Page page, uint16 f);
extern void SpGistInitBuffer(Buffer b, uint16 f);
extern void SpGistInitMetapage(Page page);
extern unsigned int SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum);
+extern unsigned int SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull);
extern SpGistLeafTuple spgFormLeafTuple(SpGistState *state,
ItemPointer heapPtr,
- Datum datum, bool isnull);
+ Datum *datum, bool *isnull);
extern SpGistNodeTuple spgFormNodeTuple(SpGistState *state,
Datum label, bool isnull);
extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
@@ -466,6 +506,8 @@ extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
int nNodes, SpGistNodeTuple *nodes);
extern SpGistDeadTuple spgFormDeadTuple(SpGistState *state, int tupstate,
BlockNumber blkno, OffsetNumber offnum);
+extern void SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state,
+ Datum *datum, bool *isnull, bool key_value_isnull);
extern Datum *spgExtractNodeLabels(SpGistState *state,
SpGistInnerTuple innerTuple);
extern OffsetNumber SpGistPageAddNewItem(SpGistState *state, Page page,
@@ -484,7 +526,7 @@ extern void spgPageIndexMultiDelete(SpGistState *state, Page page,
int firststate, int reststate,
BlockNumber blkno, OffsetNumber offnum);
extern bool spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull);
+ ItemPointer heapPtr, Datum *datum, bool *isnull);
/* spgproc.c */
extern double *spg_key_orderbys_distances(Datum key, bool isLeaf,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index d92a6d12c6..93e6a43b6d 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -169,9 +169,9 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
hash | bogus |
spgist | can_order | f
spgist | can_unique | f
- spgist | can_multi_col | f
+ spgist | can_multi_col | t
spgist | can_exclude | t
- spgist | can_include | f
+ spgist | can_include | t
spgist | bogus |
(36 rows)
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..4fd2b7e878 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -356,7 +356,6 @@ CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "brin" does not support included columns
CREATE INDEX on tbl USING gist(c3) INCLUDE (c1, c4);
CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
-ERROR: access method "spgist" does not support included columns
CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "gin" does not support included columns
CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
diff --git a/src/test/regress/expected/index_including_spgist.out b/src/test/regress/expected/index_including_spgist.out
new file mode 100644
index 0000000000..fa64766fb7
--- /dev/null
+++ b/src/test/regress/expected/index_including_spgist.out
@@ -0,0 +1,139 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+DROP TABLE tbl_spgist;
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+----------
+(0 rows)
+
+DROP TABLE tbl_spgist;
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+ Table "public.tbl_spgist"
+ Column | Type | Collation | Nullable | Default
+--------+---------+-----------+----------+---------
+ c1 | bigint | | |
+ c2 | integer | | |
+ c3 | bigint | | |
+ c4 | box | | |
+Indexes:
+ "tbl_spgist_idx" spgist (c4) INCLUDE (c1, c3)
+
+DROP TABLE tbl_spgist;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..985458a1a8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -50,7 +50,7 @@ test: copy copyselect copydml insert insert_conflict
# ----------
test: create_misc create_operator create_procedure
# These depend on create_misc and create_operator
-test: create_index create_index_spgist create_view index_including index_including_gist
+test: create_index create_index_spgist create_view index_including index_including_gist index_including_spgist
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..f3df961535 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -68,6 +68,7 @@ test: create_index_spgist
test: create_view
test: index_including
test: index_including_gist
+test: index_including_spgist
test: create_aggregate
test: create_function_3
test: create_cast
diff --git a/src/test/regress/sql/index_including_spgist.sql b/src/test/regress/sql/index_including_spgist.sql
new file mode 100644
index 0000000000..a59e73aa22
--- /dev/null
+++ b/src/test/regress/sql/index_including_spgist.sql
@@ -0,0 +1,81 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+DROP TABLE tbl_spgist;
+
--
2.28.0
With a little bugfix
вт, 11 авг. 2020 г. в 22:50, Pavel Borisov <pashkin.elfe@gmail.com>:
Show quoted text
вт, 11 авг. 2020 г. в 12:11, Pavel Borisov <pashkin.elfe@gmail.com>:
I added changes in documentation into the patch.
--
Best regards,
Pavel BorisovPostgres Professional: http://postgrespro.com
<http://www.postgrespro.com>
Attachments:
v6-0001-Covering-SP-GiST-index-support-for-INCLUDE-column.patchapplication/octet-stream; name=v6-0001-Covering-SP-GiST-index-support-for-INCLUDE-column.patchDownload
From ec819baf34b69ff2a3ba3e882e167ab1d3030b75 Mon Sep 17 00:00:00 2001
From: Pavel Borisov <pashkin.elfe@gmail.com>
Date: Mon, 17 Aug 2020 20:01:48 +0400
Subject: [PATCH v6] Covering SP-GiST index - support for INCLUDE columns
Adding INCLUDE colums for SPGiST index is intended to increase the speed of queries by making scan index only likewise
in btree and GiST index. These included values are added only to leaf tuples and they are not used in index tree search
but they can be fetched during index scan.
The other point of included columns is to overcome SP-GiST limitation of being single-column in principle. I.e. in
certain cases a single covering SP-GiST index can replace several separate ones with less disk space and shared buffers
consumption, faster update etc. Also there can be included any data types without SP-GiST supported opclasses.
Discussion: https://www.postgresql.org/message-id/flat/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com
---
doc/src/sgml/indices.sgml | 4 +-
doc/src/sgml/ref/create_index.sgml | 4 +-
doc/src/sgml/spgist.sgml | 8 +
src/backend/access/spgist/README | 2 +-
src/backend/access/spgist/spgdoinsert.c | 172 +++++---
src/backend/access/spgist/spginsert.c | 5 +-
src/backend/access/spgist/spgscan.c | 87 +++-
src/backend/access/spgist/spgutils.c | 385 ++++++++++++++++--
src/backend/access/spgist/spgvacuum.c | 25 +-
src/backend/access/spgist/spgxlog.c | 6 +-
src/include/access/spgist_private.h | 160 +++++---
src/test/regress/expected/amutils.out | 4 +-
src/test/regress/expected/index_including.out | 1 -
.../expected/index_including_spgist.out | 139 +++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
.../regress/sql/index_including_spgist.sql | 81 ++++
17 files changed, 901 insertions(+), 185 deletions(-)
create mode 100644 src/test/regress/expected/index_including_spgist.out
create mode 100644 src/test/regress/sql/index_including_spgist.sql
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 28adaba72d..c89cc6cb08 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1194,8 +1194,8 @@ CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);
likely to not need to access the heap. If the heap tuple must be visited
anyway, it costs nothing more to get the column's value from there.
Other restrictions are that expressions are not currently supported as
- included columns, and that only B-tree and GiST indexes currently support
- included columns.
+ included columns, and that only B-tree, GiST and SP-GiST indexes currently
+ support included columns.
</para>
<para>
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index ff87b2d28f..3d360bcf47 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -187,8 +187,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
</para>
<para>
- Currently, the B-tree and the GiST index access methods support this
- feature. In B-tree and the GiST indexes, the values of columns listed
+ Currently, the B-tree, GiST and SP-GiST index access methods support
+ this feature. In these indexes, the values of columns listed
in the <literal>INCLUDE</literal> clause are included in leaf tuples
which correspond to heap tuples, but are not included in upper-level
index entries used for tree navigation.
diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index 0e04a08679..868a140a6a 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -240,6 +240,14 @@
inner tuples that are passed through to reach the leaf level.
</para>
+ <para>
+ In case when <acronym>SP-GiST</acronym> index is created with
+ <literal>INCLUDE</literal> clause i.e. covering index, leaf tuples also
+ contain data from included columns. This data is stored uncompressed and can have
+ data types without any SP-GiST operator class.
+
+ </para>
+
<para>
Inner tuples are more complex, since they are branching points in the
search tree. Each inner tuple contains a set of one or more
diff --git a/src/backend/access/spgist/README b/src/backend/access/spgist/README
index b55b073832..87e08431fa 100644
--- a/src/backend/access/spgist/README
+++ b/src/backend/access/spgist/README
@@ -73,8 +73,8 @@ Leaf tuple consists of:
Example:
radix tree - the rest of string (postfix)
quad and k-d tree - the point itself
-
ItemPointer to the heap
+ optional included colums values
NULLS HANDLING
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 934d65b89f..4c133b7106 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -22,7 +22,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
-
+#include "access/htup_details.h"
/*
* SPPageDesc tracks all info about a page we are inserting into. In some
@@ -220,7 +220,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
SpGistBlockIsRoot(current->blkno))
{
/* Tuple is not part of a chain */
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
current->offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -253,7 +253,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
PageGetItemId(current->page, current->offnum));
if (head->tupstate == SPGIST_LIVE)
{
- leafTuple->nextOffset = head->nextOffset;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, SGLT_GET_OFFSET(head->nextOffset));
offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -264,14 +264,14 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
*/
head = (SpGistLeafTuple) PageGetItem(current->page,
PageGetItemId(current->page, current->offnum));
- head->nextOffset = offnum;
+ SGLT_SET_OFFSET(head->nextOffset, offnum);
xlrec.offnumLeaf = offnum;
xlrec.offnumHeadLeaf = current->offnum;
}
else if (head->tupstate == SPGIST_DEAD)
{
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
PageIndexTupleDelete(current->page, current->offnum);
if (PageAddItem(current->page,
(Item) leafTuple, leafTuple->size,
@@ -362,13 +362,13 @@ checkSplitConditions(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* Don't count it in result, because it won't go to other page */
}
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
*nToSplit = n;
@@ -437,7 +437,7 @@ moveLeafs(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* We don't want to move it, so don't count it in size */
toDelete[nDelete] = i;
nDelete++;
@@ -446,7 +446,7 @@ moveLeafs(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
/* Find a leaf page that will hold them */
@@ -475,7 +475,7 @@ moveLeafs(Relation index, SpGistState *state,
* don't care). We're modifying the tuple on the source page
* here, but it's okay since we're about to delete it.
*/
- it->nextOffset = r;
+ SGLT_SET_OFFSET(it->nextOffset, r);
r = SpGistPageAddNewItem(state, npage, (Item) it, it->size,
&startOffset, false);
@@ -490,7 +490,7 @@ moveLeafs(Relation index, SpGistState *state,
}
/* add the new tuple as well */
- newLeafTuple->nextOffset = r;
+ SGLT_SET_OFFSET(newLeafTuple->nextOffset, r);
r = SpGistPageAddNewItem(state, npage,
(Item) newLeafTuple, newLeafTuple->size,
&startOffset, false);
@@ -709,6 +709,9 @@ doPickSplit(Relation index, SpGistState *state,
int nToDelete,
nToInsert,
maxToInclude;
+ Datum *leafChainDatums;
+ bool *leafChainIsnulls;
+ const int natts = IndexRelationGetNumberOfAttributes(index);
in.level = level;
@@ -723,14 +726,16 @@ doPickSplit(Relation index, SpGistState *state,
toInsert = (OffsetNumber *) palloc(sizeof(OffsetNumber) * n);
newLeafs = (SpGistLeafTuple *) palloc(sizeof(SpGistLeafTuple) * n);
leafPageSelect = (uint8 *) palloc(sizeof(uint8) * n);
-
STORE_STATE(state, xlrec.stateSrc);
+ leafChainDatums = (Datum *) palloc(n * natts * sizeof(Datum));
+ leafChainIsnulls = (bool *) palloc(n * natts * sizeof(bool));
+
/*
- * Form list of leaf tuples which will be distributed as split result;
- * also, count up the amount of space that will be freed from current.
- * (Note that in the non-root case, we won't actually delete the old
- * tuples, only replace them with redirects or placeholders.)
+ * Collect leaf tuples which will be distributed as split result; also,
+ * count up the amount of space that will be freed from current. (Note
+ * that in the non-root case, we won't actually delete the old tuples,
+ * only replace them with redirects or placeholders.)
*
* Note: the SGLTDATUM calls here are safe even when dealing with a nulls
* page. For a pass-by-value data type we will fetch a word that must
@@ -738,7 +743,15 @@ doPickSplit(Relation index, SpGistState *state,
* tuples must have size at least SGDTSIZE). For a pass-by-reference type
* we are just computing a pointer that isn't going to get dereferenced.
* So it's not worth guarding the calls with isNulls checks.
+ *
+ * Datums and isnulls of all leaf tuple attributes in a chain are
+ * collected into 2-d arrays: (number of tuples in chain) x (number of
+ * attributes) First attribute is key, the other - included attributes (if
+ * any). After picksplit we need to form new leaf tuples as key attribute
+ * length can change which can affect alignment of every include
+ * attribute.
*/
+
nToInsert = 0;
nToDelete = 0;
spaceToDelete = 0;
@@ -759,6 +772,8 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+ SpGistDeformLeafTuple(it, state, leafChainDatums + nToInsert * natts,
+ leafChainIsnulls + nToInsert * natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -784,6 +799,9 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+
+ SpGistDeformLeafTuple(it, state, leafChainDatums + nToInsert * natts,
+ leafChainIsnulls + nToInsert * natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -795,7 +813,7 @@ doPickSplit(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
toDelete[nToDelete] = i;
nToDelete++;
/* replacing it with redirect will save no space */
@@ -803,7 +821,7 @@ doPickSplit(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
}
in.nTuples = nToInsert;
@@ -816,10 +834,17 @@ doPickSplit(Relation index, SpGistState *state,
*/
in.datums[in.nTuples] = SGLTDATUM(newLeafTuple, state);
heapPtrs[in.nTuples] = newLeafTuple->heapPtr;
+
+ SpGistDeformLeafTuple(newLeafTuple, state, leafChainDatums + (in.nTuples) * natts,
+ leafChainIsnulls + (in.nTuples) * natts, isNulls);
in.nTuples++;
memset(&out, 0, sizeof(out));
+ /*
+ * Process collected key values of tuples from the chain. Included values
+ * are used to build fresh leaf tuples unchanged.
+ */
if (!isNulls)
{
/*
@@ -837,9 +862,11 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- out.leafTupleDatums[i],
- false);
+ *(leafChainDatums + i * natts) = (Datum) out.leafTupleDatums[i];
+ *(leafChainIsnulls + i * natts) = false;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums + i * natts,
+ leafChainIsnulls + i * natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -860,9 +887,14 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- (Datum) 0,
- true);
+ /*
+ * Nulls tree can contain only null key values.
+ */
+ *(leafChainDatums + i * natts) = (Datum) 0;
+ *(leafChainIsnulls + i * natts) = true;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums + i * natts,
+ leafChainIsnulls + i * natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -1196,10 +1228,10 @@ doPickSplit(Relation index, SpGistState *state,
if (ItemPointerIsValid(&nodes[n]->t_tid))
{
Assert(ItemPointerGetBlockNumber(&nodes[n]->t_tid) == leafBlock);
- it->nextOffset = ItemPointerGetOffsetNumber(&nodes[n]->t_tid);
+ SGLT_SET_OFFSET(it->nextOffset, ItemPointerGetOffsetNumber(&nodes[n]->t_tid));
}
else
- it->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(it->nextOffset, InvalidOffsetNumber);
/* Insert it on page */
newoffset = SpGistPageAddNewItem(state, BufferGetPage(leafBuffer),
@@ -1889,67 +1921,83 @@ spgSplitNodeAction(Relation index, SpGistState *state,
*/
bool
spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull)
+ ItemPointer heapPtr, Datum *datum, bool *isnull)
{
int level = 0;
- Datum leafDatum;
+ Datum *leafDatum;
int leafSize;
SPPageDesc current,
parent;
FmgrInfo *procinfo = NULL;
+ int i;
/*
* Look up FmgrInfo of the user-defined choose function once, to save
* cycles in the loop below.
*/
- if (!isnull)
+ if (!isnull[0])
procinfo = index_getprocinfo(index, 1, SPGIST_CHOOSE_PROC);
/*
* Prepare the leaf datum to insert.
- *
+ */
+
+ leafDatum = (Datum *) palloc0(sizeof(Datum) * (IndexRelationGetNumberOfAttributes(index)));
+
+ /*
* If an optional "compress" method is provided, then call it to form the
- * leaf datum from the input datum. Otherwise store the input datum as
- * is. Since we don't use index_form_tuple in this AM, we have to make
- * sure value to be inserted is not toasted; FormIndexDatum doesn't
- * guarantee that. But we assume the "compress" method to return an
- * untoasted value.
+ * key datum from the input datum. Otherwise store the input datum as is.
+ * Since we don't use index_form_tuple in this AM, we have to make sure
+ * value to be inserted is not toasted; FormIndexDatum doesn't guarantee
+ * that. But we assume the "compress" method to return an untoasted
+ * value.
*/
- if (!isnull)
+ if (!isnull[0])
{
if (OidIsValid(index_getprocid(index, 1, SPGIST_COMPRESS_PROC)))
{
FmgrInfo *compressProcinfo = NULL;
compressProcinfo = index_getprocinfo(index, 1, SPGIST_COMPRESS_PROC);
- leafDatum = FunctionCall1Coll(compressProcinfo,
- index->rd_indcollation[0],
- datum);
+ leafDatum[0] = FunctionCall1Coll(compressProcinfo,
+ index->rd_indcollation[0],
+ datum[0]);
}
else
{
Assert(state->attLeafType.type == state->attType.type);
if (state->attType.attlen == -1)
- leafDatum = PointerGetDatum(PG_DETOAST_DATUM(datum));
+ leafDatum[0] = PointerGetDatum(PG_DETOAST_DATUM(datum[0]));
else
- leafDatum = datum;
+ leafDatum[0] = datum[0];
}
}
else
- leafDatum = (Datum) 0;
+ leafDatum[0] = (Datum) 0;
+
+ for (i = 1; i < IndexRelationGetNumberOfAttributes(index); i++)
+ {
+ if (!isnull[i])
+ {
+ if (TupleDescAttr(state->includeTupdesc, i - 1)->attlen == -1)
+ leafDatum[i] = PointerGetDatum(PG_DETOAST_DATUM(datum[i]));
+ else
+ leafDatum[i] = datum[i];
+ }
+ else
+ leafDatum[i] = (Datum) 0;
+ }
+
/*
- * Compute space needed for a leaf tuple containing the given datum.
+ * Compute space needed on a page for a leaf tuple containing the given
+ * datum.
*
* If it isn't gonna fit, and the opclass can't reduce the datum size by
* suffixing, bail out now rather than getting into an endless loop.
*/
- if (!isnull)
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
- else
- leafSize = SGDTSIZE + sizeof(ItemIdData);
+ leafSize = SpgLeafSize(state, leafDatum, isnull) + sizeof(ItemIdData);
if (leafSize > SPGIST_PAGE_CAPACITY && !state->config.longValuesOK)
ereport(ERROR,
@@ -1961,7 +2009,7 @@ spgdoinsert(Relation index, SpGistState *state,
errhint("Values larger than a buffer page cannot be indexed.")));
/* Initialize "current" to the appropriate root page */
- current.blkno = isnull ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
+ current.blkno = isnull[0] ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
current.buffer = InvalidBuffer;
current.page = NULL;
current.offnum = FirstOffsetNumber;
@@ -1995,7 +2043,7 @@ spgdoinsert(Relation index, SpGistState *state,
*/
current.buffer =
SpGistGetBuffer(index,
- GBUF_LEAF | (isnull ? GBUF_NULLS : 0),
+ GBUF_LEAF | (isnull[0] ? GBUF_NULLS : 0),
Min(leafSize, SPGIST_PAGE_CAPACITY),
&isNew);
current.blkno = BufferGetBlockNumber(current.buffer);
@@ -2037,7 +2085,7 @@ spgdoinsert(Relation index, SpGistState *state,
current.page = BufferGetPage(current.buffer);
/* should not arrive at a page of the wrong type */
- if (isnull ? !SpGistPageStoresNulls(current.page) :
+ if (isnull[0] ? !SpGistPageStoresNulls(current.page) :
SpGistPageStoresNulls(current.page))
elog(ERROR, "SPGiST index page %u has wrong nulls flag",
current.blkno);
@@ -2054,7 +2102,7 @@ spgdoinsert(Relation index, SpGistState *state,
{
/* it fits on page, so insert it and we're done */
addLeafTuple(index, state, leafTuple,
- ¤t, &parent, isnull, isNew);
+ ¤t, &parent, isnull[0], isNew);
break;
}
else if ((sizeToSplit =
@@ -2068,14 +2116,14 @@ spgdoinsert(Relation index, SpGistState *state,
* chain to another leaf page rather than splitting it.
*/
Assert(!isNew);
- moveLeafs(index, state, ¤t, &parent, leafTuple, isnull);
+ moveLeafs(index, state, ¤t, &parent, leafTuple, isnull[0]);
break; /* we're done */
}
else
{
/* picksplit */
if (doPickSplit(index, state, ¤t, &parent,
- leafTuple, level, isnull, isNew))
+ leafTuple, level, isnull[0], isNew))
break; /* doPickSplit installed new tuples */
/* leaf tuple will not be inserted yet */
@@ -2110,8 +2158,8 @@ spgdoinsert(Relation index, SpGistState *state,
innerTuple = (SpGistInnerTuple) PageGetItem(current.page,
PageGetItemId(current.page, current.offnum));
- in.datum = datum;
- in.leafDatum = leafDatum;
+ in.datum = datum[0];
+ in.leafDatum = leafDatum[0];
in.level = level;
in.allTheSame = innerTuple->allTheSame;
in.hasPrefix = (innerTuple->prefixSize > 0);
@@ -2121,7 +2169,7 @@ spgdoinsert(Relation index, SpGistState *state,
memset(&out, 0, sizeof(out));
- if (!isnull)
+ if (!isnull[0])
{
/* use user-defined choose method */
FunctionCall2Coll(procinfo,
@@ -2158,11 +2206,11 @@ spgdoinsert(Relation index, SpGistState *state,
/* Adjust level as per opclass request */
level += out.result.matchNode.levelAdd;
/* Replace leafDatum and recompute leafSize */
- if (!isnull)
+ if (!isnull[0])
{
- leafDatum = out.result.matchNode.restDatum;
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
+ leafDatum[0] = out.result.matchNode.restDatum;
+ leafSize = SpgLeafSize(state, leafDatum, isnull) +
+ sizeof(ItemIdData);
}
/*
@@ -2227,6 +2275,6 @@ spgdoinsert(Relation index, SpGistState *state,
SpGistSetLastUsedPage(index, parent.buffer);
UnlockReleaseBuffer(parent.buffer);
}
-
+ pfree(leafDatum);
return true;
}
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4508a2b92..b54ae85f6e 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -55,8 +55,7 @@ spgistBuildCallback(Relation index, ItemPointer tid, Datum *values,
* lock on some buffer. So we need to be willing to retry. We can flush
* any temp data when retrying.
*/
- while (!spgdoinsert(index, &buildstate->spgstate, tid,
- *values, *isnull))
+ while (!spgdoinsert(index, &buildstate->spgstate, tid, values, isnull))
{
MemoryContextReset(buildstate->tmpCtx);
}
@@ -226,7 +225,7 @@ spginsert(Relation index, Datum *values, bool *isnull,
* to avoid cumulative memory consumption. That means we also have to
* redo initSpGistState(), but it's cheap enough not to matter.
*/
- while (!spgdoinsert(index, &spgstate, ht_ctid, *values, *isnull))
+ while (!spgdoinsert(index, &spgstate, ht_ctid, values, isnull))
{
MemoryContextReset(insertCtx);
initSpGistState(&spgstate, index);
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 4d506bfb9a..5a3c7c50cf 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -28,7 +28,8 @@
typedef void (*storeRes_func) (SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isNull, bool recheck,
- bool recheckDistances, double *distances);
+ bool recheckDistances, double *distances,
+ SpGistLeafTuple leafTuple);
/*
* Pairing heap comparison function for the SpGistSearchItem queue.
@@ -88,6 +89,9 @@ spgFreeSearchItem(SpGistScanOpaque so, SpGistSearchItem *item)
if (item->traversalValue)
pfree(item->traversalValue);
+ if (item->isLeaf && item->leafTuple)
+ pfree(item->leafTuple);
+
pfree(item);
}
@@ -134,6 +138,8 @@ spgAddStartItem(SpGistScanOpaque so, bool isnull)
startEntry->recheck = false;
startEntry->recheckDistances = false;
+ startEntry->leafTuple = NULL;
+
spgAddSearchItemToQueue(so, startEntry);
}
@@ -438,14 +444,30 @@ spgendscan(IndexScanDesc scan)
* Leaf SpGistSearchItem constructor, called in queue context
*/
static SpGistSearchItem *
-spgNewHeapItem(SpGistScanOpaque so, int level, ItemPointer heapPtr,
+spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
Datum leafValue, bool recheck, bool recheckDistances,
bool isnull, double *distances)
{
SpGistSearchItem *item = spgAllocSearchItem(so, isnull, distances);
+ /*
+ * If there are include attributes search item in the queue should contain
+ * them.
+ */
+ if (so->state.includeTupdesc)
+ {
+ Assert(so->state.includeTupdesc->natts);
+
+ item->leafTuple = palloc(leafTuple->size);
+ memcpy(item->leafTuple, leafTuple, leafTuple->size);
+ }
+ else
+ {
+ item->leafTuple = NULL;
+ }
+
item->level = level;
- item->heapPtr = *heapPtr;
+ item->heapPtr = leafTuple->heapPtr;
/* copy value to queue cxt out of tmp cxt */
item->value = isnull ? (Datum) 0 :
datumCopy(leafValue, so->state.attLeafType.attbyval,
@@ -503,6 +525,8 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
in.returnData = so->want_itup;
in.leafDatum = SGLTDATUM(leafTuple, &so->state);
+
+
out.leafValue = (Datum) 0;
out.recheck = false;
out.distances = NULL;
@@ -528,7 +552,7 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
/* the scan is ordered -> add the item to the queue */
MemoryContext oldCxt = MemoryContextSwitchTo(so->traversalCxt);
SpGistSearchItem *heapItem = spgNewHeapItem(so, item->level,
- &leafTuple->heapPtr,
+ leafTuple,
leafValue,
recheck,
recheckDistances,
@@ -543,8 +567,10 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
{
/* non-ordered scan, so report the item right away */
Assert(!recheckDistances);
+
storeRes(so, &leafTuple->heapPtr, leafValue, isnull,
- recheck, false, NULL);
+ recheck, false, NULL, leafTuple);
+
*reportedSome = true;
}
}
@@ -736,7 +762,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
/* dead tuple should be first in chain */
Assert(offset == ItemPointerGetOffsetNumber(&item->heapPtr));
/* No live entries on this page */
- Assert(leafTuple->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(leafTuple->nextOffset) == InvalidOffsetNumber);
return SpGistBreakOffsetNumber;
}
}
@@ -750,7 +776,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes);
- return leafTuple->nextOffset;
+ return SGLT_GET_OFFSET(leafTuple->nextOffset);
}
/*
@@ -782,8 +808,8 @@ redirect:
{
/* We store heap items in the queue only in case of ordered search */
Assert(so->numberOfNonNullOrderBys > 0);
- storeRes(so, &item->heapPtr, item->value, item->isNull,
- item->recheck, item->recheckDistances, item->distances);
+ storeRes(so, &item->heapPtr, item->value, item->isNull, item->recheck,
+ item->recheckDistances, item->distances, item->leafTuple);
reportedSome = true;
}
else
@@ -877,7 +903,7 @@ redirect:
static void
storeBitmap(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *distances)
+ double *distances, SpGistLeafTuple leafTuple)
{
Assert(!recheckDistances && !distances);
tbm_add_tuples(so->tbm, heapPtr, 1, recheck);
@@ -904,7 +930,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
static void
storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *nonNullDistances)
+ double *nonNullDistances, SpGistLeafTuple leafTuple)
{
Assert(so->nPtrs < MaxIndexTuplesPerPage);
so->heapPtrs[so->nPtrs] = *heapPtr;
@@ -949,9 +975,38 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
* Reconstruct index data. We have to copy the datum out of the temp
* context anyway, so we may as well create the tuple here.
*/
- so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
- &leafValue,
- &isnull);
+ if (so->state.includeTupdesc)
+ {
+ /* Add included attributes */
+ Datum *leafDatums;
+ bool *leafIsnulls;
+
+ Assert(so->state.includeTupdesc->natts);
+
+ leafDatums = (Datum *) palloc(sizeof(Datum) * (so->state.includeTupdesc->natts + 1));
+ leafIsnulls = (bool *) palloc(sizeof(bool) * (so->state.includeTupdesc->natts + 1));
+
+ SpGistDeformLeafTuple(leafTuple, &so->state, leafDatums, leafIsnulls, isnull);
+
+ /*
+ * override key value extracted from LeafTuple in case we've
+ * reconstructed it already
+ */
+ leafDatums[0] = leafValue;
+ leafIsnulls[0] = isnull;
+
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ leafDatums,
+ leafIsnulls);
+ pfree(leafDatums);
+ pfree(leafIsnulls);
+ }
+ else
+ {
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ &leafValue,
+ &isnull);
+ }
}
so->nPtrs++;
}
@@ -1019,6 +1074,10 @@ spgcanreturn(Relation index, int attno)
{
SpGistCache *cache;
+ /* Included attributes always can be fetched for index-only scans */
+ if (attno > 1)
+ return true;
+
/* We can do it if the opclass config function says so */
cache = spgGetCache(index);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 0efe05e552..9b1633eeda 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -31,7 +31,18 @@
#include "utils/index_selfuncs.h"
#include "utils/lsyscache.h"
#include "utils/syscache.h"
+#include "access/itup.h"
+#include "access/detoast.h"
+#include "access/toast_internals.h"
+#include "access/heaptoast.h"
+#include "utils/expandeddatum.h"
+/* Does att's datatype allow packing into the 1-byte-header varlena format? */
+#define ATT_IS_PACKABLE(att) \
+ ((att)->attlen == -1 && (att)->attstorage != TYPSTORAGE_PLAIN)
+
+Size spgIncludedDataSize(TupleDesc tupleDesc, Datum *values,
+ bool *isnull, Size start);
/*
* SP-GiST handler function: return IndexAmRoutine with access method parameters
@@ -49,7 +60,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amcanorderbyop = true;
amroutine->amcanbackward = false;
amroutine->amcanunique = false;
- amroutine->amcanmulticol = false;
+ amroutine->amcanmulticol = true;
amroutine->amoptionalkey = true;
amroutine->amsearcharray = false;
amroutine->amsearchnulls = true;
@@ -57,7 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amclusterable = false;
amroutine->ampredlocks = false;
amroutine->amcanparallel = false;
- amroutine->amcaninclude = false;
+ amroutine->amcaninclude = true;
amroutine->amusemaintenanceworkmem = false;
amroutine->amparallelvacuumoptions =
VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
@@ -116,14 +127,21 @@ spgGetCache(Relation index)
cache = MemoryContextAllocZero(index->rd_indexcxt,
sizeof(SpGistCache));
- /* SPGiST doesn't support multi-column indexes */
- Assert(index->rd_att->natts == 1);
+ /*
+ * SPGiST should have one key column and can also have included
+ * columns
+ */
+ if (IndexRelationGetNumberOfKeyAttributes(index) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("SPGiST index can have only one key column")));
/*
- * Get the actual data type of the indexed column from the index
- * tupdesc. We pass this to the opclass config function so that
- * polymorphic opclasses are possible.
+ * Get the actual data type of the key column from the index tupdesc.
+ * We pass this to the opclass config function so that polymorphic
+ * opclasses are possible.
*/
+
atttype = TupleDescAttr(index->rd_att, 0)->atttypid;
/* Call the config function to get config info for the opclass */
@@ -156,6 +174,7 @@ spgGetCache(Relation index)
fillTypeDesc(&cache->attPrefixType, cache->config.prefixType);
fillTypeDesc(&cache->attLabelType, cache->config.labelType);
+
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
@@ -177,7 +196,23 @@ spgGetCache(Relation index)
/* assume it's up to date */
cache = (SpGistCache *) index->rd_amcache;
}
+ /* Form descriptor for included columns if any */
+ if (IndexRelationGetNumberOfAttributes(index) > 1)
+ {
+ int i;
+
+ cache->includeTupdesc = CreateTemplateTupleDesc(
+ IndexRelationGetNumberOfAttributes(index) - 1);
+ for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; i++)
+ {
+ TupleDescInitEntry(cache->includeTupdesc, i + 1, NULL,
+ TupleDescAttr(index->rd_att, i + 1)->atttypid,
+ -1, 0);
+ }
+ }
+ else
+ cache->includeTupdesc = NULL;
return cache;
}
@@ -190,6 +225,7 @@ initSpGistState(SpGistState *state, Relation index)
/* Get cached static information about index */
cache = spgGetCache(index);
+ state->includeTupdesc = cache->includeTupdesc;
state->config = cache->config;
state->attType = cache->attType;
state->attLeafType = cache->attLeafType;
@@ -603,7 +639,7 @@ spgoptions(Datum reloptions, bool validate)
/*
* Get the space needed to store a non-null datum of the indicated type.
- * Note the result is already rounded up to a MAXALIGN boundary.
+ * Note the result is not maxaligned and this should be done by caller if needed.
* Also, we follow the SPGiST convention that pass-by-val types are
* just stored in their Datum representation (compare memcpyDatum).
*/
@@ -619,7 +655,7 @@ SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum)
else
size = VARSIZE_ANY(datum);
- return MAXALIGN(size);
+ return size;
}
/*
@@ -642,36 +678,205 @@ memcpyDatum(void *target, SpGistTypeDesc *att, Datum datum)
}
/*
- * Construct a leaf tuple containing the given heap TID and datum value
+ * Private version of heap_compute_data_size with start address not
+ * at MAXALIGN boundary. The reason is that start address (and alignment)
+ * influence alignment of each of next values and overall size of included
+ * data area in SpGiST leaf tuple. MAXALINGing first include attribute is
+ * avoided for not to introduce unnecessary gap before it.
+ */
+Size
+spgIncludedDataSize(TupleDesc tupleDesc,
+ Datum *values,
+ bool *isnull, Size start)
+{
+ Size data_length = 0;
+ int i;
+ int numberOfAttributes = tupleDesc->natts;
+
+ data_length = start;
+ for (i = 0; i < numberOfAttributes; i++)
+ {
+ Datum val;
+ Form_pg_attribute atti;
+
+ if (isnull[i])
+ continue;
+
+ val = values[i];
+ atti = TupleDescAttr(tupleDesc, i);
+
+ if (ATT_IS_PACKABLE(atti) &&
+ VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
+ {
+ /*
+ * we're anticipating converting to a short varlena header, so
+ * adjust length and don't count any alignment
+ */
+ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
+ }
+ else if (atti->attlen == -1 &&
+ VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+ {
+ /*
+ * we want to flatten the expanded value so that the constructed
+ * tuple doesn't depend on it
+ */
+ data_length = att_align_nominal(data_length, atti->attalign);
+ data_length += EOH_get_flat_size(DatumGetEOHP(val));
+ }
+ else
+ {
+ data_length = att_align_datum(data_length, atti->attalign,
+ atti->attlen, val);
+ data_length = att_addlength_datum(data_length, atti->attlen,
+ val);
+ }
+ }
+ return data_length - start;
+}
+
+/* Calculate overall leaf tuple size. SGLTHDRSZ is MAXALIGNed for backward
+ * compatibility and there might be gap between header and key data. After key
+ * data there are no such gaps more than is is necessary for each value
+ * alignment. Overall result is MAXALIGNed which is anyway unavoidable
+ * when placing a tuple on a page.
+ */
+unsigned int
+SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull)
+{
+ /* compute space needed, nullmask size and offset for include attributes */
+ unsigned int size = SGLTHDRSZ;
+ unsigned int i;
+
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+ /* nullmask size */
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ size += (state->includeTupdesc->natts / 8) + 1;
+ break;
+ }
+ }
+ /* overall included attributes size each with added proper alignment. */
+ size += spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ }
+ return MAXALIGN(size);
+}
+
+/*
+ * Construct a leaf tuple containing the given heap TID, key data and included
+ * columns data. Key data starts from MAXALIGN boundary for backward compatibility.
+ * Nullmask apply only to included attributes and is placed just after key data if
+ * there is at least one NULL among included attributes. It doesn't need alignment.
+ * Then all included columns data follow aligned by their typealign's.
*/
SpGistLeafTuple
spgFormLeafTuple(SpGistState *state, ItemPointer heapPtr,
- Datum datum, bool isnull)
+ Datum *datum, bool *isnull)
{
SpGistLeafTuple tup;
- unsigned int size;
+ unsigned int size = SGLTHDRSZ;
+ unsigned int include_offset = 0;
+ unsigned int nullmask_size = 0;
+ unsigned int data_offset = 0;
+ unsigned int data_size = 0;
+ uint16 tupmask = 0;
+ int i;
- /* compute space needed (note result is already maxaligned) */
- size = SGLTHDRSZ;
- if (!isnull)
- size += SpGistGetTypeSize(&state->attLeafType, datum);
+ /*
+ * Calculate space needed. If there are include attributes also calculate
+ * sizes and offsets needed for heap_fill_tuple
+ */
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = size;
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ nullmask_size = (state->includeTupdesc->natts / 8) + 1;
+ size += nullmask_size;
+ break;
+ }
+ }
+
+ /*
+ * Alignment of all included attributes is counted inside data_size.
+ * data_offset itself is not aligned.
+ */
+ data_size = spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ data_offset = size;
+
+ size += data_size;
+ }
/*
* Ensure that we can replace the tuple with a dead tuple later. This
- * test is unnecessary when !isnull, but let's be safe.
+ * test is unnecessary when !isnull[0], but let's be safe.
*/
if (size < SGDTSIZE)
size = SGDTSIZE;
/* OK, form the tuple */
- tup = (SpGistLeafTuple) palloc0(size);
+ tup = (SpGistLeafTuple) palloc0(MAXALIGN(size));
- tup->size = size;
- tup->nextOffset = InvalidOffsetNumber;
+ tup->size = MAXALIGN(size);
+ SGLT_SET_OFFSET(tup->nextOffset, InvalidOffsetNumber);
tup->heapPtr = *heapPtr;
- if (!isnull)
- memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum);
+ if (!isnull[0])
+ memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum[0]);
+
+ /* Add included columns data to leaf tuple if any. */
+ if (state->includeTupdesc)
+ {
+ /*
+ * The start of include attributes tuple is not aligned by default.
+ * All values alignment should be done by heap_fill_tuple
+ * automaticaly. If there is a nulls mask it is included just after
+ * key attribute data and it should not be aligned.
+ */
+ heap_fill_tuple(state->includeTupdesc, datum + 1, isnull + 1,
+ (char *) tup + data_offset,
+ data_size, &tupmask,
+ (nullmask_size ? (bits8 *) tup + include_offset : NULL));
+
+ if (nullmask_size)
+ SGLT_SET_CONTAINSNULLMASK(tup->nextOffset, 1);
+
+ /*
+ * We do this because heap_fill_tuple wants to initialize a "tupmask"
+ * which is used for HeapTuples, but the only relevant info is the
+ * "has variable attributes" field. We have already set the hasnull
+ * bit above.
+ */
+ if (tupmask & HEAP_HASVARWIDTH)
+ SGLT_SET_CONTAINSVARATT(tup->nextOffset, 1);
+ }
return tup;
}
@@ -688,10 +893,10 @@ spgFormNodeTuple(SpGistState *state, Datum label, bool isnull)
unsigned int size;
unsigned short infomask = 0;
- /* compute space needed (note result is already maxaligned) */
+ /* compute space needed */
size = SGNTHDRSZ;
if (!isnull)
- size += SpGistGetTypeSize(&state->attLabelType, label);
+ size += MAXALIGN(SpGistGetTypeSize(&state->attLabelType, label));
/*
* Here we make sure that the size will fit in the field reserved for it
@@ -735,7 +940,7 @@ spgFormInnerTuple(SpGistState *state, bool hasPrefix, Datum prefix,
/* Compute size needed */
if (hasPrefix)
- prefixSize = SpGistGetTypeSize(&state->attPrefixType, prefix);
+ prefixSize = MAXALIGN(SpGistGetTypeSize(&state->attPrefixType, prefix));
else
prefixSize = 0;
@@ -1046,3 +1251,133 @@ spgproperty(Oid index_oid, int attno,
return true;
}
+
+/*
+ * Convert an SpGist tuple into palloc'd Datum/isnull arrays.
+ *
+ */
+void
+SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state, Datum *datum, bool *isnull,
+ bool key_isnull)
+{
+ unsigned int include_offset; /* offset of include data */
+ int off;
+ bits8 *nullmask_ptr = NULL; /* ptr to null bitmap in tuple */
+ char *tp;
+ bool slow = false; /* can we use/set attcacheoff? */
+ int i;
+
+ if (key_isnull)
+ {
+ datum[0] = (Datum) 0;
+ isnull[0] = true;
+ }
+ else
+ {
+ datum[0] = SGLTDATUM(tup, state);
+ isnull[0] = false;
+ }
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = key_isnull ? SGLTHDRSZ : SGLTHDRSZ + SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ tp = (char *) tup;
+ off = include_offset;
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ nullmask_ptr = (bits8 *) tp + include_offset;
+ off += (state->includeTupdesc->natts) / 8 + 1;
+ }
+
+ if (state->attLeafType.attlen > 0 && !SGLT_GET_CONTAINSVARATT(tup->nextOffset) &&
+ !SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ /* can use attcacheoff for all attributes */
+ {
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ isnull[i] = false;
+ if (thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else
+ {
+ off = att_align_nominal(off, thisatt->attalign);
+ thisatt->attcacheoff = off;
+ }
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+ }
+ }
+ else
+
+ /*
+ * general case: can use cache until first null or varlen
+ * attribute
+ */
+ {
+ if (state->attLeafType.attlen <= 0)
+ slow = true; /* can't use attcacheoff at all */
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ if (att_isnull(i - 1, nullmask_ptr))
+ {
+ datum[i] = (Datum) 0;
+ isnull[i] = true;
+ slow = true; /* can't use attcacheoff anymore */
+ continue;
+ }
+ }
+
+ isnull[i] = false;
+
+ if (!slow && thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else if (thisatt->attlen == -1)
+ {
+ /*
+ * We can only cache the offset for a varlena attribute if
+ * the offset is already suitably aligned, so that there
+ * would be no pad bytes in any case: then the offset will
+ * be valid for either an aligned or unaligned value.
+ */
+ if (!slow && off == att_align_nominal(off, thisatt->attalign))
+ thisatt->attcacheoff = off;
+ else
+ {
+ off = att_align_pointer(off, thisatt->attalign, -1, tp + off);
+ slow = true;
+ }
+ }
+ else
+ {
+ /* not varlena, so safe to use att_align_nominal */
+ off = att_align_nominal(off, thisatt->attalign);
+
+ if (!slow)
+ thisatt->attcacheoff = off;
+ }
+
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+
+ if (thisatt->attlen <= 0)
+ slow = true; /* can't use attcacheoff anymore */
+ }
+ }
+ }
+}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index bd98707f3c..a0d76901fc 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -168,23 +168,28 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
/* Form predecessor map, too */
- if (lt->nextOffset != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) != InvalidOffsetNumber)
{
/* paranoia about corrupted chain links */
- if (lt->nextOffset < FirstOffsetNumber ||
- lt->nextOffset > max ||
- predecessor[lt->nextOffset] != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) < FirstOffsetNumber ||
+ SGLT_GET_OFFSET(lt->nextOffset) > max ||
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] != InvalidOffsetNumber)
elog(ERROR, "inconsistent tuple chain links in page %u of index \"%s\"",
BufferGetBlockNumber(buffer),
RelationGetRelationName(index));
- predecessor[lt->nextOffset] = i;
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] = i;
}
}
else if (lt->tupstate == SPGIST_REDIRECT)
{
SpGistDeadTuple dt = (SpGistDeadTuple) lt;
- Assert(dt->nextOffset == InvalidOffsetNumber);
+ /*
+ * Dead tuple nextOffset is allowed to have any values of two
+ * highest bits in case it is inherited from SpGistLeafTuple where
+ * these bits has their own meaning.
+ */
+ Assert(SGLT_GET_OFFSET(dt->nextOffset) == InvalidOffsetNumber);
Assert(ItemPointerIsValid(&dt->pointer));
/*
@@ -201,7 +206,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
else
{
- Assert(lt->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(lt->nextOffset) == InvalidOffsetNumber);
}
}
@@ -250,7 +255,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
prevLive = deletable[i] ? InvalidOffsetNumber : i;
/* scan down the chain ... */
- j = head->nextOffset;
+ j = SGLT_GET_OFFSET(head->nextOffset);
while (j != InvalidOffsetNumber)
{
SpGistLeafTuple lt;
@@ -301,7 +306,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
interveningDeletable = false;
}
- j = lt->nextOffset;
+ j = SGLT_GET_OFFSET(lt->nextOffset);
}
if (prevLive == InvalidOffsetNumber)
@@ -366,7 +371,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 7be2291d07..4022e3af07 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -122,8 +122,8 @@ spgRedoAddLeaf(XLogReaderState *record)
head = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, xldata->offnumHeadLeaf));
- Assert(head->nextOffset == leafTupleHdr.nextOffset);
- head->nextOffset = xldata->offnumLeaf;
+ Assert(SGLT_GET_OFFSET(head->nextOffset) == SGLT_GET_OFFSET(leafTupleHdr.nextOffset));
+ SGLT_SET_OFFSET(head->nextOffset, xldata->offnumLeaf);
}
}
else
@@ -822,7 +822,7 @@ spgRedoVacuumLeaf(XLogReaderState *record)
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
PageSetLSN(page, lsn);
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index 00b98ec6a0..74cb715eaf 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -141,6 +141,7 @@ typedef struct SpGistState
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc; /* tuple descriptor of included columns */
char *deadTupleStorage; /* workspace for spgFormDeadTuple */
@@ -148,6 +149,98 @@ typedef struct SpGistState
bool isBuild; /* true if doing index build */
} SpGistState;
+/*
+ * SPGiST leaf tuple: carries a datum and a heap tuple TID
+ *
+ * In the simplest case, the datum is the same as the indexed value; but
+ * it could also be a suffix or some other sort of delta that permits
+ * reconstruction given knowledge of the prefix path traversed to get here.
+ *
+ * The size field is wider than could possibly be needed for an on-disk leaf
+ * tuple, but this allows us to form leaf tuples even when the datum is too
+ * wide to be stored immediately, and it costs nothing because of alignment
+ * considerations.
+ *
+ * Normally, nextOffset links to the next tuple belonging to the same parent
+ * node (which must be on the same page). But when the root page is a leaf
+ * page, we don't chain its tuples, so nextOffset is always 0 on the root.
+ *
+ * size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
+ * so that the tuple can be converted to REDIRECT status later. (This
+ * restriction only adds bytes for the null-datum case, otherwise alignment
+ * restrictions force it anyway.)
+ *
+ * In a leaf tuple for a NULL indexed value, there's no useful datum value;
+ * however, the SGDTSIZE limit ensures that's there's a Datum word there
+ * anyway, so SGLTDATUM can be applied safely as long as you don't do
+ * anything with the result.
+ *
+ * Minimum space to store SpGistLeafTuple on a page is 12 bytes tuple header
+ * and 4 bytes ItemIdData so 14 lower bits of nextOffset (accessed as
+ * SGLT_GET/SET_OFFSET) is enough to store actual tuple number on a page even
+ * if page size is 64Kb. Two higher bits are to store per-tuple
+ * information is there nulls mask exist and is there any included attribute
+ * of variable length type.
+ */
+
+typedef struct SpGistLeafTupleData
+{
+ unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
+ size:30; /* large enough for any palloc'able value */
+ OffsetNumber nextOffset; /* higher 1 bit = 1 if included values has
+ * nulls, 2 bit = 1 if included values contain
+ * variable length values, lower 15 bits - is
+ * "actual" nextOffset i.e. number of next
+ * tuple in chain on a page, or
+ * InvalidOffsetNumber. They SHOULD NOT be
+ * set/read directly,
+ * SGLT_SET_XXX/SGLT_GET_XXX macros must be
+ * used instead. */
+ ItemPointerData heapPtr; /* TID of represented heap tuple */
+ /* leaf datum follows */
+
+ /*
+ * if SGLT_GET_CONTAINSNULLMASK nullmask follows. Its size (number of
+ * included columns/8)+1
+ */
+ /* include attributes follow if any */
+} SpGistLeafTupleData;
+
+typedef SpGistLeafTupleData *SpGistLeafTuple;
+
+#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
+#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
+#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
+ *(Datum *) SGLTDATAPTR(x) : \
+ PointerGetDatum(SGLTDATAPTR(x)))
+/*
+ * Accessor macros to get and set actual 14-bit offset and two bit flags from/to
+ * nextOffset value.
+ */
+#define SGLT_GET_OFFSET(x) ( (x) & 0x3FFF )
+#define SGLT_GET_CONTAINSNULLMASK(x) ( (x) >> 15 )
+#define SGLT_GET_CONTAINSVARATT(x) ( ( (x) & 0x4000 ) >> 14 )
+#define SGLT_SET_OFFSET(x,o) ( (x) = ( (x) & 0xC000 ) | ( (o) & 0x3FFF) )
+#define SGLT_SET_CONTAINSNULLMASK(x,n) ( (x) = ( (n) << 15 ) | ( (x) & 0x3FFF ) )
+#define SGLT_SET_CONTAINSVARATT(x,v) ( (x) = ( (v) << 14 ) | ( (x) & 0xBFFF ) )
+
+#define SGLT_GET_INCLUDE_TUPSIZE(x) SGLT_GET_OFFSET(x)
+#define SGLT_SET_INCLUDE_TUPSIZE(x,o) SGLT_SET_OFFSET(x,o)
+
+extern char *SpGistFormIncludeTuple(TupleDesc tupleDescriptor, Datum *values,
+ bool *isnull, uint16 *tupdata);
+
+/*
+ * SPGiST dead tuple: declaration for examining non-live tuples
+ *
+ * The tupstate field of this struct must match those of regular inner and
+ * leaf tuples, and its size field must match a leaf tuple's.
+ * Also, the pointer field must be in the same place as a leaf tuple's heapPtr
+ * field, to satisfy some Asserts that we make when replacing a leaf tuple
+ * with a dead tuple.
+ * We don't use nextOffset, but it's needed to align the pointer field.
+ */
+
typedef struct SpGistSearchItem
{
pairingheap_node phNode; /* pairing heap node */
@@ -160,14 +253,14 @@ typedef struct SpGistSearchItem
bool isLeaf; /* SearchItem is heap item */
bool recheck; /* qual recheck is needed */
bool recheckDistances; /* distance recheck is needed */
-
+ SpGistLeafTuple leafTuple;
/* array with numberOfOrderBys entries */
double distances[FLEXIBLE_ARRAY_MEMBER];
+ /* if there are include columns SpGistLeafTupleData follow */
} SpGistSearchItem;
#define SizeOfSpGistSearchItem(n_distances) \
(offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
-
/*
* Private state of an index scan
*/
@@ -241,6 +334,7 @@ typedef struct SpGistCache
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc;
SpGistLUPCache lastUsedPages; /* local storage of last-used info */
} SpGistCache;
@@ -321,60 +415,6 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
*(Datum *) SGNTDATAPTR(x) : \
PointerGetDatum(SGNTDATAPTR(x)))
-/*
- * SPGiST leaf tuple: carries a datum and a heap tuple TID
- *
- * In the simplest case, the datum is the same as the indexed value; but
- * it could also be a suffix or some other sort of delta that permits
- * reconstruction given knowledge of the prefix path traversed to get here.
- *
- * The size field is wider than could possibly be needed for an on-disk leaf
- * tuple, but this allows us to form leaf tuples even when the datum is too
- * wide to be stored immediately, and it costs nothing because of alignment
- * considerations.
- *
- * Normally, nextOffset links to the next tuple belonging to the same parent
- * node (which must be on the same page). But when the root page is a leaf
- * page, we don't chain its tuples, so nextOffset is always 0 on the root.
- *
- * size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
- * so that the tuple can be converted to REDIRECT status later. (This
- * restriction only adds bytes for the null-datum case, otherwise alignment
- * restrictions force it anyway.)
- *
- * In a leaf tuple for a NULL indexed value, there's no useful datum value;
- * however, the SGDTSIZE limit ensures that's there's a Datum word there
- * anyway, so SGLTDATUM can be applied safely as long as you don't do
- * anything with the result.
- */
-typedef struct SpGistLeafTupleData
-{
- unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
- size:30; /* large enough for any palloc'able value */
- OffsetNumber nextOffset; /* next tuple in chain, or InvalidOffsetNumber */
- ItemPointerData heapPtr; /* TID of represented heap tuple */
- /* leaf datum follows */
-} SpGistLeafTupleData;
-
-typedef SpGistLeafTupleData *SpGistLeafTuple;
-
-#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
-#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
-#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
- *(Datum *) SGLTDATAPTR(x) : \
- PointerGetDatum(SGLTDATAPTR(x)))
-
-/*
- * SPGiST dead tuple: declaration for examining non-live tuples
- *
- * The tupstate field of this struct must match those of regular inner and
- * leaf tuples, and its size field must match a leaf tuple's.
- * Also, the pointer field must be in the same place as a leaf tuple's heapPtr
- * field, to satisfy some Asserts that we make when replacing a leaf tuple
- * with a dead tuple.
- * We don't use nextOffset, but it's needed to align the pointer field.
- * pointer and xid are only valid when tupstate = REDIRECT.
- */
typedef struct SpGistDeadTupleData
{
unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
@@ -394,7 +434,6 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
* size plus sizeof(ItemIdData) (for the line pointer). This works correctly
* so long as tuple sizes are always maxaligned.
*/
-
/* Page capacity after allowing for fixed header and special space */
#define SPGIST_PAGE_CAPACITY \
MAXALIGN_DOWN(BLCKSZ - \
@@ -456,9 +495,10 @@ extern void SpGistInitPage(Page page, uint16 f);
extern void SpGistInitBuffer(Buffer b, uint16 f);
extern void SpGistInitMetapage(Page page);
extern unsigned int SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum);
+extern unsigned int SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull);
extern SpGistLeafTuple spgFormLeafTuple(SpGistState *state,
ItemPointer heapPtr,
- Datum datum, bool isnull);
+ Datum *datum, bool *isnull);
extern SpGistNodeTuple spgFormNodeTuple(SpGistState *state,
Datum label, bool isnull);
extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
@@ -466,6 +506,8 @@ extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
int nNodes, SpGistNodeTuple *nodes);
extern SpGistDeadTuple spgFormDeadTuple(SpGistState *state, int tupstate,
BlockNumber blkno, OffsetNumber offnum);
+extern void SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state,
+ Datum *datum, bool *isnull, bool key_value_isnull);
extern Datum *spgExtractNodeLabels(SpGistState *state,
SpGistInnerTuple innerTuple);
extern OffsetNumber SpGistPageAddNewItem(SpGistState *state, Page page,
@@ -484,7 +526,7 @@ extern void spgPageIndexMultiDelete(SpGistState *state, Page page,
int firststate, int reststate,
BlockNumber blkno, OffsetNumber offnum);
extern bool spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull);
+ ItemPointer heapPtr, Datum *datum, bool *isnull);
/* spgproc.c */
extern double *spg_key_orderbys_distances(Datum key, bool isLeaf,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index d92a6d12c6..93e6a43b6d 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -169,9 +169,9 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
hash | bogus |
spgist | can_order | f
spgist | can_unique | f
- spgist | can_multi_col | f
+ spgist | can_multi_col | t
spgist | can_exclude | t
- spgist | can_include | f
+ spgist | can_include | t
spgist | bogus |
(36 rows)
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..4fd2b7e878 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -356,7 +356,6 @@ CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "brin" does not support included columns
CREATE INDEX on tbl USING gist(c3) INCLUDE (c1, c4);
CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
-ERROR: access method "spgist" does not support included columns
CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "gin" does not support included columns
CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
diff --git a/src/test/regress/expected/index_including_spgist.out b/src/test/regress/expected/index_including_spgist.out
new file mode 100644
index 0000000000..fa64766fb7
--- /dev/null
+++ b/src/test/regress/expected/index_including_spgist.out
@@ -0,0 +1,139 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+DROP TABLE tbl_spgist;
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+----------
+(0 rows)
+
+DROP TABLE tbl_spgist;
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+ Table "public.tbl_spgist"
+ Column | Type | Collation | Nullable | Default
+--------+---------+-----------+----------+---------
+ c1 | bigint | | |
+ c2 | integer | | |
+ c3 | bigint | | |
+ c4 | box | | |
+Indexes:
+ "tbl_spgist_idx" spgist (c4) INCLUDE (c1, c3)
+
+DROP TABLE tbl_spgist;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..985458a1a8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -50,7 +50,7 @@ test: copy copyselect copydml insert insert_conflict
# ----------
test: create_misc create_operator create_procedure
# These depend on create_misc and create_operator
-test: create_index create_index_spgist create_view index_including index_including_gist
+test: create_index create_index_spgist create_view index_including index_including_gist index_including_spgist
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..f3df961535 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -68,6 +68,7 @@ test: create_index_spgist
test: create_view
test: index_including
test: index_including_gist
+test: index_including_spgist
test: create_aggregate
test: create_function_3
test: create_cast
diff --git a/src/test/regress/sql/index_including_spgist.sql b/src/test/regress/sql/index_including_spgist.sql
new file mode 100644
index 0000000000..a59e73aa22
--- /dev/null
+++ b/src/test/regress/sql/index_including_spgist.sql
@@ -0,0 +1,81 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+DROP TABLE tbl_spgist;
+
--
2.28.0
Hi!
17 авг. 2020 г., в 21:04, Pavel Borisov <pashkin.elfe@gmail.com> написал(а):
Postgres Professional: http://postgrespro.com
<v6-0001-Covering-SP-GiST-index-support-for-INCLUDE-column.patch>
I'm looking into the patch. I have few notes:
1. I see that in src/backend/access/spgist/README you describe SP-GiST tuple as sequence of {Value, ItemPtr to heap, Included attributes}. Is it different from regular IndexTuple where tid is within TupleHeader?
2. Instead of cluttering tuple->nextOffset with bit flags we could just change Tuple Header for leaf tuples with covering indexes. Interpret tuples for indexes with included attributes differently, iff it makes code cleaner. There are so many changes with SGLT_SET_OFFSET\SGLT_GET_OFFSET that it seems viable to put some effort into research of other ways to represent two bits for null mask and varatts.
3. Comment "* SPGiST dead tuple: declaration for examining non-live tuples" does not precede relevant code. because struct SpGistDeadTupleData was not moved.
Thanks!
Best regards, Andrey Borodin.
I'm looking into the patch. I have few notes:
1. I see that in src/backend/access/spgist/README you describe SP-GiST
tuple as sequence of {Value, ItemPtr to heap, Included attributes}. Is it
different from regular IndexTuple where tid is within TupleHeader?
Yes, the header of SpGist tuple is put down in a little bit different way
than index tuple. It is also intended to connect spgist leaf tuples in
chains on a leaf page so it already have more complex layout and bigger
size that index tuple header.
SpGist tuple header size is 12 bytes which is a maxaligned value for 32 bit
architectures, and key value can start just after it without any gap. This
is of value, as unnecessary index size increase slows down performance and
is evil anyway. The only part of this which is left now is a gap
between SpGist tuple header and first value on 64 bit architecture (as
maxalign value in this case is 16 bytes and 4 bytes per tuple can be
saved). But I was discouraged to change this on the reason of binary
compatibility with indexes built before and complexity of the change also,
as quite many things in the code do depend on this maxaligned header (for
inner and dead tuples also).
Another difference is that SpGist nulls mask is inserted after the key
value before the first included one and apply only to included values. It
is not needed for key values, as null key values in SpGist are stored in
separate tree, and it is not needed to mark it null second time. Also nulls
mask size in Spgist does depend on the number of included values in a
tuple, unlike in IndexTuple which contains redundant nulls mask for all
possible INDEX_MAX_KEYS. In certain cases we can store nulls mask in free
bytes after key value before typealign of first included value. (E.g. if
key value is varchar (radix tree) statistically we have only 1/8 of keys
finishing exactly an maxalign, the others will have a natural gap for nulls
mask.)
2. Instead of cluttering tuple->nextOffset with bit flags we could just
change Tuple Header for leaf tuples with covering indexes. Interpret tuples
for indexes with included attributes differently, iff it makes code
cleaner. There are so many changes with SGLT_SET_OFFSET\SGLT_GET_OFFSET
that it seems viable to put some effort into research of other ways to
represent two bits for null mask and varatts.
Of course SpGist header can be done different for index with and without
included columns. I see two reasons against this:
1. It will be needed to integrate many ifs and in many places keep in mind
whether the index contains included values. It is expected to be much more
code than now and not only in the parts which integrates included values to
leaf tuples. I think this vast changes can puzzle reader much more than
just two small macros evenly copy-pasted in the code.
2. I also see no need to increase SpGist tuple size just for inserting two
bits which are now stored free of charge. I consulted with bit flags
storage in IndexTupleData.t_tid and did it in a similar way. Macros for
GET/SET are basically needed to make bit flags and offset modification
independent and safe in any place of a code.
I added some extra comments and mentions in manual to make all the things
clear (see v7 patch)
3. Comment "* SPGiST dead tuple: declaration for examining non-live
tuples" does not precede relevant code. because struct SpGistDeadTupleData
was not moved.
You are right, thank you! Corrected this and also removed some unnecessary
declarations.
Thank you for your attention to the patch!
Attachments:
v7-0001-Covering-SP-GiST-index-support-for-INCLUDE-column.patchapplication/octet-stream; name=v7-0001-Covering-SP-GiST-index-support-for-INCLUDE-column.patchDownload
From 083e2c691a912fc1555626f80b49c02f4c4aaf11 Mon Sep 17 00:00:00 2001
From: Pavel Borisov <pashkin.elfe@gmail.com>
Date: Mon, 24 Aug 2020 17:17:31 +0400
Subject: [PATCH v7] Covering SP-GiST index - support for INCLUDE columns
Adding INCLUDE colums for SPGiST index is intended to increase the speed of queries by making scan index only likewise
in btree and GiST index. These included values are added only to leaf tuples and they are not used in index tree search
but they can be fetched during index scan.
The other point of included columns is to overcome SP-GiST limitation of being single-column in principle. I.e. in
certain cases a single covering SP-GiST index can replace several separate ones with less disk space and shared buffers
consumption, faster update etc. Also there can be included any data types without SP-GiST supported opclasses.
Discussion: https://www.postgresql.org/message-id/flat/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com
---
doc/src/sgml/indices.sgml | 4 +-
doc/src/sgml/ref/create_index.sgml | 4 +-
doc/src/sgml/spgist.sgml | 8 +
src/backend/access/spgist/README | 18 +-
src/backend/access/spgist/spgdoinsert.c | 172 +++++---
src/backend/access/spgist/spginsert.c | 5 +-
src/backend/access/spgist/spgscan.c | 87 +++-
src/backend/access/spgist/spgutils.c | 385 ++++++++++++++++--
src/backend/access/spgist/spgvacuum.c | 25 +-
src/backend/access/spgist/spgxlog.c | 6 +-
src/include/access/spgist_private.h | 265 +++++++-----
src/test/regress/expected/amutils.out | 4 +-
src/test/regress/expected/index_including.out | 1 -
.../expected/index_including_spgist.out | 139 +++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
.../regress/sql/index_including_spgist.sql | 80 ++++
17 files changed, 968 insertions(+), 238 deletions(-)
create mode 100644 src/test/regress/expected/index_including_spgist.out
create mode 100644 src/test/regress/sql/index_including_spgist.sql
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 28adaba72d..c89cc6cb08 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1194,8 +1194,8 @@ CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);
likely to not need to access the heap. If the heap tuple must be visited
anyway, it costs nothing more to get the column's value from there.
Other restrictions are that expressions are not currently supported as
- included columns, and that only B-tree and GiST indexes currently support
- included columns.
+ included columns, and that only B-tree, GiST and SP-GiST indexes currently
+ support included columns.
</para>
<para>
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index ff87b2d28f..3d360bcf47 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -187,8 +187,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
</para>
<para>
- Currently, the B-tree and the GiST index access methods support this
- feature. In B-tree and the GiST indexes, the values of columns listed
+ Currently, the B-tree, GiST and SP-GiST index access methods support
+ this feature. In these indexes, the values of columns listed
in the <literal>INCLUDE</literal> clause are included in leaf tuples
which correspond to heap tuples, but are not included in upper-level
index entries used for tree navigation.
diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index 0e04a08679..868a140a6a 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -240,6 +240,14 @@
inner tuples that are passed through to reach the leaf level.
</para>
+ <para>
+ In case when <acronym>SP-GiST</acronym> index is created with
+ <literal>INCLUDE</literal> clause i.e. covering index, leaf tuples also
+ contain data from included columns. This data is stored uncompressed and can have
+ data types without any SP-GiST operator class.
+
+ </para>
+
<para>
Inner tuples are more complex, since they are branching points in the
search tree. Each inner tuple contains a set of one or more
diff --git a/src/backend/access/spgist/README b/src/backend/access/spgist/README
index b55b073832..636747a6a8 100644
--- a/src/backend/access/spgist/README
+++ b/src/backend/access/spgist/README
@@ -73,9 +73,20 @@ Leaf tuple consists of:
Example:
radix tree - the rest of string (postfix)
quad and k-d tree - the point itself
-
ItemPointer to the heap
-
+ nextOffset number that points to next leaf tuple in a chain
+ optional nullmask for included column values
+ optional included colums values
+
+Parts of leaf tuple are laid out to make the header and the key value
+placement unchanged in case of index with and without included values and
+backward compatible. Also it is intended to be aligned with minimum possible
+gaps to make index smaller. I.e. first header of 12 bytes, then a key value
+starting from maxalign boundary, then just immediately nulls mask bytes,
+then included attributes each starting from its typealign boundary. So in
+many cases nulls mask is stored free of charge and tuple occupy minimum
+possible space (with exception of gap before key value which starts from
+maxalign for compatibility).
NULLS HANDLING
@@ -90,6 +101,9 @@ Insertions and searches in the nulls tree do not use any of the
opclass-supplied functions, but just use hardwired logic comparable to
AllTheSame cases in the normal tree.
+For included attributes nulls are handled in ordinary per leaf-tuple way i.e.
+there is null mask presence bit in a header and, if it is true, nullmask is
+added just after key value before the first included attribute.
INSERTION ALGORITHM
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 934d65b89f..4c133b7106 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -22,7 +22,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
-
+#include "access/htup_details.h"
/*
* SPPageDesc tracks all info about a page we are inserting into. In some
@@ -220,7 +220,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
SpGistBlockIsRoot(current->blkno))
{
/* Tuple is not part of a chain */
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
current->offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -253,7 +253,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
PageGetItemId(current->page, current->offnum));
if (head->tupstate == SPGIST_LIVE)
{
- leafTuple->nextOffset = head->nextOffset;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, SGLT_GET_OFFSET(head->nextOffset));
offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -264,14 +264,14 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
*/
head = (SpGistLeafTuple) PageGetItem(current->page,
PageGetItemId(current->page, current->offnum));
- head->nextOffset = offnum;
+ SGLT_SET_OFFSET(head->nextOffset, offnum);
xlrec.offnumLeaf = offnum;
xlrec.offnumHeadLeaf = current->offnum;
}
else if (head->tupstate == SPGIST_DEAD)
{
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
PageIndexTupleDelete(current->page, current->offnum);
if (PageAddItem(current->page,
(Item) leafTuple, leafTuple->size,
@@ -362,13 +362,13 @@ checkSplitConditions(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* Don't count it in result, because it won't go to other page */
}
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
*nToSplit = n;
@@ -437,7 +437,7 @@ moveLeafs(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* We don't want to move it, so don't count it in size */
toDelete[nDelete] = i;
nDelete++;
@@ -446,7 +446,7 @@ moveLeafs(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
/* Find a leaf page that will hold them */
@@ -475,7 +475,7 @@ moveLeafs(Relation index, SpGistState *state,
* don't care). We're modifying the tuple on the source page
* here, but it's okay since we're about to delete it.
*/
- it->nextOffset = r;
+ SGLT_SET_OFFSET(it->nextOffset, r);
r = SpGistPageAddNewItem(state, npage, (Item) it, it->size,
&startOffset, false);
@@ -490,7 +490,7 @@ moveLeafs(Relation index, SpGistState *state,
}
/* add the new tuple as well */
- newLeafTuple->nextOffset = r;
+ SGLT_SET_OFFSET(newLeafTuple->nextOffset, r);
r = SpGistPageAddNewItem(state, npage,
(Item) newLeafTuple, newLeafTuple->size,
&startOffset, false);
@@ -709,6 +709,9 @@ doPickSplit(Relation index, SpGistState *state,
int nToDelete,
nToInsert,
maxToInclude;
+ Datum *leafChainDatums;
+ bool *leafChainIsnulls;
+ const int natts = IndexRelationGetNumberOfAttributes(index);
in.level = level;
@@ -723,14 +726,16 @@ doPickSplit(Relation index, SpGistState *state,
toInsert = (OffsetNumber *) palloc(sizeof(OffsetNumber) * n);
newLeafs = (SpGistLeafTuple *) palloc(sizeof(SpGistLeafTuple) * n);
leafPageSelect = (uint8 *) palloc(sizeof(uint8) * n);
-
STORE_STATE(state, xlrec.stateSrc);
+ leafChainDatums = (Datum *) palloc(n * natts * sizeof(Datum));
+ leafChainIsnulls = (bool *) palloc(n * natts * sizeof(bool));
+
/*
- * Form list of leaf tuples which will be distributed as split result;
- * also, count up the amount of space that will be freed from current.
- * (Note that in the non-root case, we won't actually delete the old
- * tuples, only replace them with redirects or placeholders.)
+ * Collect leaf tuples which will be distributed as split result; also,
+ * count up the amount of space that will be freed from current. (Note
+ * that in the non-root case, we won't actually delete the old tuples,
+ * only replace them with redirects or placeholders.)
*
* Note: the SGLTDATUM calls here are safe even when dealing with a nulls
* page. For a pass-by-value data type we will fetch a word that must
@@ -738,7 +743,15 @@ doPickSplit(Relation index, SpGistState *state,
* tuples must have size at least SGDTSIZE). For a pass-by-reference type
* we are just computing a pointer that isn't going to get dereferenced.
* So it's not worth guarding the calls with isNulls checks.
+ *
+ * Datums and isnulls of all leaf tuple attributes in a chain are
+ * collected into 2-d arrays: (number of tuples in chain) x (number of
+ * attributes) First attribute is key, the other - included attributes (if
+ * any). After picksplit we need to form new leaf tuples as key attribute
+ * length can change which can affect alignment of every include
+ * attribute.
*/
+
nToInsert = 0;
nToDelete = 0;
spaceToDelete = 0;
@@ -759,6 +772,8 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+ SpGistDeformLeafTuple(it, state, leafChainDatums + nToInsert * natts,
+ leafChainIsnulls + nToInsert * natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -784,6 +799,9 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+
+ SpGistDeformLeafTuple(it, state, leafChainDatums + nToInsert * natts,
+ leafChainIsnulls + nToInsert * natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -795,7 +813,7 @@ doPickSplit(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
toDelete[nToDelete] = i;
nToDelete++;
/* replacing it with redirect will save no space */
@@ -803,7 +821,7 @@ doPickSplit(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
}
in.nTuples = nToInsert;
@@ -816,10 +834,17 @@ doPickSplit(Relation index, SpGistState *state,
*/
in.datums[in.nTuples] = SGLTDATUM(newLeafTuple, state);
heapPtrs[in.nTuples] = newLeafTuple->heapPtr;
+
+ SpGistDeformLeafTuple(newLeafTuple, state, leafChainDatums + (in.nTuples) * natts,
+ leafChainIsnulls + (in.nTuples) * natts, isNulls);
in.nTuples++;
memset(&out, 0, sizeof(out));
+ /*
+ * Process collected key values of tuples from the chain. Included values
+ * are used to build fresh leaf tuples unchanged.
+ */
if (!isNulls)
{
/*
@@ -837,9 +862,11 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- out.leafTupleDatums[i],
- false);
+ *(leafChainDatums + i * natts) = (Datum) out.leafTupleDatums[i];
+ *(leafChainIsnulls + i * natts) = false;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums + i * natts,
+ leafChainIsnulls + i * natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -860,9 +887,14 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- (Datum) 0,
- true);
+ /*
+ * Nulls tree can contain only null key values.
+ */
+ *(leafChainDatums + i * natts) = (Datum) 0;
+ *(leafChainIsnulls + i * natts) = true;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums + i * natts,
+ leafChainIsnulls + i * natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -1196,10 +1228,10 @@ doPickSplit(Relation index, SpGistState *state,
if (ItemPointerIsValid(&nodes[n]->t_tid))
{
Assert(ItemPointerGetBlockNumber(&nodes[n]->t_tid) == leafBlock);
- it->nextOffset = ItemPointerGetOffsetNumber(&nodes[n]->t_tid);
+ SGLT_SET_OFFSET(it->nextOffset, ItemPointerGetOffsetNumber(&nodes[n]->t_tid));
}
else
- it->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(it->nextOffset, InvalidOffsetNumber);
/* Insert it on page */
newoffset = SpGistPageAddNewItem(state, BufferGetPage(leafBuffer),
@@ -1889,67 +1921,83 @@ spgSplitNodeAction(Relation index, SpGistState *state,
*/
bool
spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull)
+ ItemPointer heapPtr, Datum *datum, bool *isnull)
{
int level = 0;
- Datum leafDatum;
+ Datum *leafDatum;
int leafSize;
SPPageDesc current,
parent;
FmgrInfo *procinfo = NULL;
+ int i;
/*
* Look up FmgrInfo of the user-defined choose function once, to save
* cycles in the loop below.
*/
- if (!isnull)
+ if (!isnull[0])
procinfo = index_getprocinfo(index, 1, SPGIST_CHOOSE_PROC);
/*
* Prepare the leaf datum to insert.
- *
+ */
+
+ leafDatum = (Datum *) palloc0(sizeof(Datum) * (IndexRelationGetNumberOfAttributes(index)));
+
+ /*
* If an optional "compress" method is provided, then call it to form the
- * leaf datum from the input datum. Otherwise store the input datum as
- * is. Since we don't use index_form_tuple in this AM, we have to make
- * sure value to be inserted is not toasted; FormIndexDatum doesn't
- * guarantee that. But we assume the "compress" method to return an
- * untoasted value.
+ * key datum from the input datum. Otherwise store the input datum as is.
+ * Since we don't use index_form_tuple in this AM, we have to make sure
+ * value to be inserted is not toasted; FormIndexDatum doesn't guarantee
+ * that. But we assume the "compress" method to return an untoasted
+ * value.
*/
- if (!isnull)
+ if (!isnull[0])
{
if (OidIsValid(index_getprocid(index, 1, SPGIST_COMPRESS_PROC)))
{
FmgrInfo *compressProcinfo = NULL;
compressProcinfo = index_getprocinfo(index, 1, SPGIST_COMPRESS_PROC);
- leafDatum = FunctionCall1Coll(compressProcinfo,
- index->rd_indcollation[0],
- datum);
+ leafDatum[0] = FunctionCall1Coll(compressProcinfo,
+ index->rd_indcollation[0],
+ datum[0]);
}
else
{
Assert(state->attLeafType.type == state->attType.type);
if (state->attType.attlen == -1)
- leafDatum = PointerGetDatum(PG_DETOAST_DATUM(datum));
+ leafDatum[0] = PointerGetDatum(PG_DETOAST_DATUM(datum[0]));
else
- leafDatum = datum;
+ leafDatum[0] = datum[0];
}
}
else
- leafDatum = (Datum) 0;
+ leafDatum[0] = (Datum) 0;
+
+ for (i = 1; i < IndexRelationGetNumberOfAttributes(index); i++)
+ {
+ if (!isnull[i])
+ {
+ if (TupleDescAttr(state->includeTupdesc, i - 1)->attlen == -1)
+ leafDatum[i] = PointerGetDatum(PG_DETOAST_DATUM(datum[i]));
+ else
+ leafDatum[i] = datum[i];
+ }
+ else
+ leafDatum[i] = (Datum) 0;
+ }
+
/*
- * Compute space needed for a leaf tuple containing the given datum.
+ * Compute space needed on a page for a leaf tuple containing the given
+ * datum.
*
* If it isn't gonna fit, and the opclass can't reduce the datum size by
* suffixing, bail out now rather than getting into an endless loop.
*/
- if (!isnull)
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
- else
- leafSize = SGDTSIZE + sizeof(ItemIdData);
+ leafSize = SpgLeafSize(state, leafDatum, isnull) + sizeof(ItemIdData);
if (leafSize > SPGIST_PAGE_CAPACITY && !state->config.longValuesOK)
ereport(ERROR,
@@ -1961,7 +2009,7 @@ spgdoinsert(Relation index, SpGistState *state,
errhint("Values larger than a buffer page cannot be indexed.")));
/* Initialize "current" to the appropriate root page */
- current.blkno = isnull ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
+ current.blkno = isnull[0] ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
current.buffer = InvalidBuffer;
current.page = NULL;
current.offnum = FirstOffsetNumber;
@@ -1995,7 +2043,7 @@ spgdoinsert(Relation index, SpGistState *state,
*/
current.buffer =
SpGistGetBuffer(index,
- GBUF_LEAF | (isnull ? GBUF_NULLS : 0),
+ GBUF_LEAF | (isnull[0] ? GBUF_NULLS : 0),
Min(leafSize, SPGIST_PAGE_CAPACITY),
&isNew);
current.blkno = BufferGetBlockNumber(current.buffer);
@@ -2037,7 +2085,7 @@ spgdoinsert(Relation index, SpGistState *state,
current.page = BufferGetPage(current.buffer);
/* should not arrive at a page of the wrong type */
- if (isnull ? !SpGistPageStoresNulls(current.page) :
+ if (isnull[0] ? !SpGistPageStoresNulls(current.page) :
SpGistPageStoresNulls(current.page))
elog(ERROR, "SPGiST index page %u has wrong nulls flag",
current.blkno);
@@ -2054,7 +2102,7 @@ spgdoinsert(Relation index, SpGistState *state,
{
/* it fits on page, so insert it and we're done */
addLeafTuple(index, state, leafTuple,
- ¤t, &parent, isnull, isNew);
+ ¤t, &parent, isnull[0], isNew);
break;
}
else if ((sizeToSplit =
@@ -2068,14 +2116,14 @@ spgdoinsert(Relation index, SpGistState *state,
* chain to another leaf page rather than splitting it.
*/
Assert(!isNew);
- moveLeafs(index, state, ¤t, &parent, leafTuple, isnull);
+ moveLeafs(index, state, ¤t, &parent, leafTuple, isnull[0]);
break; /* we're done */
}
else
{
/* picksplit */
if (doPickSplit(index, state, ¤t, &parent,
- leafTuple, level, isnull, isNew))
+ leafTuple, level, isnull[0], isNew))
break; /* doPickSplit installed new tuples */
/* leaf tuple will not be inserted yet */
@@ -2110,8 +2158,8 @@ spgdoinsert(Relation index, SpGistState *state,
innerTuple = (SpGistInnerTuple) PageGetItem(current.page,
PageGetItemId(current.page, current.offnum));
- in.datum = datum;
- in.leafDatum = leafDatum;
+ in.datum = datum[0];
+ in.leafDatum = leafDatum[0];
in.level = level;
in.allTheSame = innerTuple->allTheSame;
in.hasPrefix = (innerTuple->prefixSize > 0);
@@ -2121,7 +2169,7 @@ spgdoinsert(Relation index, SpGistState *state,
memset(&out, 0, sizeof(out));
- if (!isnull)
+ if (!isnull[0])
{
/* use user-defined choose method */
FunctionCall2Coll(procinfo,
@@ -2158,11 +2206,11 @@ spgdoinsert(Relation index, SpGistState *state,
/* Adjust level as per opclass request */
level += out.result.matchNode.levelAdd;
/* Replace leafDatum and recompute leafSize */
- if (!isnull)
+ if (!isnull[0])
{
- leafDatum = out.result.matchNode.restDatum;
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
+ leafDatum[0] = out.result.matchNode.restDatum;
+ leafSize = SpgLeafSize(state, leafDatum, isnull) +
+ sizeof(ItemIdData);
}
/*
@@ -2227,6 +2275,6 @@ spgdoinsert(Relation index, SpGistState *state,
SpGistSetLastUsedPage(index, parent.buffer);
UnlockReleaseBuffer(parent.buffer);
}
-
+ pfree(leafDatum);
return true;
}
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4508a2b92..b54ae85f6e 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -55,8 +55,7 @@ spgistBuildCallback(Relation index, ItemPointer tid, Datum *values,
* lock on some buffer. So we need to be willing to retry. We can flush
* any temp data when retrying.
*/
- while (!spgdoinsert(index, &buildstate->spgstate, tid,
- *values, *isnull))
+ while (!spgdoinsert(index, &buildstate->spgstate, tid, values, isnull))
{
MemoryContextReset(buildstate->tmpCtx);
}
@@ -226,7 +225,7 @@ spginsert(Relation index, Datum *values, bool *isnull,
* to avoid cumulative memory consumption. That means we also have to
* redo initSpGistState(), but it's cheap enough not to matter.
*/
- while (!spgdoinsert(index, &spgstate, ht_ctid, *values, *isnull))
+ while (!spgdoinsert(index, &spgstate, ht_ctid, values, isnull))
{
MemoryContextReset(insertCtx);
initSpGistState(&spgstate, index);
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 4d506bfb9a..5a3c7c50cf 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -28,7 +28,8 @@
typedef void (*storeRes_func) (SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isNull, bool recheck,
- bool recheckDistances, double *distances);
+ bool recheckDistances, double *distances,
+ SpGistLeafTuple leafTuple);
/*
* Pairing heap comparison function for the SpGistSearchItem queue.
@@ -88,6 +89,9 @@ spgFreeSearchItem(SpGistScanOpaque so, SpGistSearchItem *item)
if (item->traversalValue)
pfree(item->traversalValue);
+ if (item->isLeaf && item->leafTuple)
+ pfree(item->leafTuple);
+
pfree(item);
}
@@ -134,6 +138,8 @@ spgAddStartItem(SpGistScanOpaque so, bool isnull)
startEntry->recheck = false;
startEntry->recheckDistances = false;
+ startEntry->leafTuple = NULL;
+
spgAddSearchItemToQueue(so, startEntry);
}
@@ -438,14 +444,30 @@ spgendscan(IndexScanDesc scan)
* Leaf SpGistSearchItem constructor, called in queue context
*/
static SpGistSearchItem *
-spgNewHeapItem(SpGistScanOpaque so, int level, ItemPointer heapPtr,
+spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
Datum leafValue, bool recheck, bool recheckDistances,
bool isnull, double *distances)
{
SpGistSearchItem *item = spgAllocSearchItem(so, isnull, distances);
+ /*
+ * If there are include attributes search item in the queue should contain
+ * them.
+ */
+ if (so->state.includeTupdesc)
+ {
+ Assert(so->state.includeTupdesc->natts);
+
+ item->leafTuple = palloc(leafTuple->size);
+ memcpy(item->leafTuple, leafTuple, leafTuple->size);
+ }
+ else
+ {
+ item->leafTuple = NULL;
+ }
+
item->level = level;
- item->heapPtr = *heapPtr;
+ item->heapPtr = leafTuple->heapPtr;
/* copy value to queue cxt out of tmp cxt */
item->value = isnull ? (Datum) 0 :
datumCopy(leafValue, so->state.attLeafType.attbyval,
@@ -503,6 +525,8 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
in.returnData = so->want_itup;
in.leafDatum = SGLTDATUM(leafTuple, &so->state);
+
+
out.leafValue = (Datum) 0;
out.recheck = false;
out.distances = NULL;
@@ -528,7 +552,7 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
/* the scan is ordered -> add the item to the queue */
MemoryContext oldCxt = MemoryContextSwitchTo(so->traversalCxt);
SpGistSearchItem *heapItem = spgNewHeapItem(so, item->level,
- &leafTuple->heapPtr,
+ leafTuple,
leafValue,
recheck,
recheckDistances,
@@ -543,8 +567,10 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
{
/* non-ordered scan, so report the item right away */
Assert(!recheckDistances);
+
storeRes(so, &leafTuple->heapPtr, leafValue, isnull,
- recheck, false, NULL);
+ recheck, false, NULL, leafTuple);
+
*reportedSome = true;
}
}
@@ -736,7 +762,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
/* dead tuple should be first in chain */
Assert(offset == ItemPointerGetOffsetNumber(&item->heapPtr));
/* No live entries on this page */
- Assert(leafTuple->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(leafTuple->nextOffset) == InvalidOffsetNumber);
return SpGistBreakOffsetNumber;
}
}
@@ -750,7 +776,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes);
- return leafTuple->nextOffset;
+ return SGLT_GET_OFFSET(leafTuple->nextOffset);
}
/*
@@ -782,8 +808,8 @@ redirect:
{
/* We store heap items in the queue only in case of ordered search */
Assert(so->numberOfNonNullOrderBys > 0);
- storeRes(so, &item->heapPtr, item->value, item->isNull,
- item->recheck, item->recheckDistances, item->distances);
+ storeRes(so, &item->heapPtr, item->value, item->isNull, item->recheck,
+ item->recheckDistances, item->distances, item->leafTuple);
reportedSome = true;
}
else
@@ -877,7 +903,7 @@ redirect:
static void
storeBitmap(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *distances)
+ double *distances, SpGistLeafTuple leafTuple)
{
Assert(!recheckDistances && !distances);
tbm_add_tuples(so->tbm, heapPtr, 1, recheck);
@@ -904,7 +930,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
static void
storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *nonNullDistances)
+ double *nonNullDistances, SpGistLeafTuple leafTuple)
{
Assert(so->nPtrs < MaxIndexTuplesPerPage);
so->heapPtrs[so->nPtrs] = *heapPtr;
@@ -949,9 +975,38 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
* Reconstruct index data. We have to copy the datum out of the temp
* context anyway, so we may as well create the tuple here.
*/
- so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
- &leafValue,
- &isnull);
+ if (so->state.includeTupdesc)
+ {
+ /* Add included attributes */
+ Datum *leafDatums;
+ bool *leafIsnulls;
+
+ Assert(so->state.includeTupdesc->natts);
+
+ leafDatums = (Datum *) palloc(sizeof(Datum) * (so->state.includeTupdesc->natts + 1));
+ leafIsnulls = (bool *) palloc(sizeof(bool) * (so->state.includeTupdesc->natts + 1));
+
+ SpGistDeformLeafTuple(leafTuple, &so->state, leafDatums, leafIsnulls, isnull);
+
+ /*
+ * override key value extracted from LeafTuple in case we've
+ * reconstructed it already
+ */
+ leafDatums[0] = leafValue;
+ leafIsnulls[0] = isnull;
+
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ leafDatums,
+ leafIsnulls);
+ pfree(leafDatums);
+ pfree(leafIsnulls);
+ }
+ else
+ {
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ &leafValue,
+ &isnull);
+ }
}
so->nPtrs++;
}
@@ -1019,6 +1074,10 @@ spgcanreturn(Relation index, int attno)
{
SpGistCache *cache;
+ /* Included attributes always can be fetched for index-only scans */
+ if (attno > 1)
+ return true;
+
/* We can do it if the opclass config function says so */
cache = spgGetCache(index);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 0efe05e552..9b1633eeda 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -31,7 +31,18 @@
#include "utils/index_selfuncs.h"
#include "utils/lsyscache.h"
#include "utils/syscache.h"
+#include "access/itup.h"
+#include "access/detoast.h"
+#include "access/toast_internals.h"
+#include "access/heaptoast.h"
+#include "utils/expandeddatum.h"
+/* Does att's datatype allow packing into the 1-byte-header varlena format? */
+#define ATT_IS_PACKABLE(att) \
+ ((att)->attlen == -1 && (att)->attstorage != TYPSTORAGE_PLAIN)
+
+Size spgIncludedDataSize(TupleDesc tupleDesc, Datum *values,
+ bool *isnull, Size start);
/*
* SP-GiST handler function: return IndexAmRoutine with access method parameters
@@ -49,7 +60,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amcanorderbyop = true;
amroutine->amcanbackward = false;
amroutine->amcanunique = false;
- amroutine->amcanmulticol = false;
+ amroutine->amcanmulticol = true;
amroutine->amoptionalkey = true;
amroutine->amsearcharray = false;
amroutine->amsearchnulls = true;
@@ -57,7 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amclusterable = false;
amroutine->ampredlocks = false;
amroutine->amcanparallel = false;
- amroutine->amcaninclude = false;
+ amroutine->amcaninclude = true;
amroutine->amusemaintenanceworkmem = false;
amroutine->amparallelvacuumoptions =
VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
@@ -116,14 +127,21 @@ spgGetCache(Relation index)
cache = MemoryContextAllocZero(index->rd_indexcxt,
sizeof(SpGistCache));
- /* SPGiST doesn't support multi-column indexes */
- Assert(index->rd_att->natts == 1);
+ /*
+ * SPGiST should have one key column and can also have included
+ * columns
+ */
+ if (IndexRelationGetNumberOfKeyAttributes(index) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("SPGiST index can have only one key column")));
/*
- * Get the actual data type of the indexed column from the index
- * tupdesc. We pass this to the opclass config function so that
- * polymorphic opclasses are possible.
+ * Get the actual data type of the key column from the index tupdesc.
+ * We pass this to the opclass config function so that polymorphic
+ * opclasses are possible.
*/
+
atttype = TupleDescAttr(index->rd_att, 0)->atttypid;
/* Call the config function to get config info for the opclass */
@@ -156,6 +174,7 @@ spgGetCache(Relation index)
fillTypeDesc(&cache->attPrefixType, cache->config.prefixType);
fillTypeDesc(&cache->attLabelType, cache->config.labelType);
+
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
@@ -177,7 +196,23 @@ spgGetCache(Relation index)
/* assume it's up to date */
cache = (SpGistCache *) index->rd_amcache;
}
+ /* Form descriptor for included columns if any */
+ if (IndexRelationGetNumberOfAttributes(index) > 1)
+ {
+ int i;
+
+ cache->includeTupdesc = CreateTemplateTupleDesc(
+ IndexRelationGetNumberOfAttributes(index) - 1);
+ for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; i++)
+ {
+ TupleDescInitEntry(cache->includeTupdesc, i + 1, NULL,
+ TupleDescAttr(index->rd_att, i + 1)->atttypid,
+ -1, 0);
+ }
+ }
+ else
+ cache->includeTupdesc = NULL;
return cache;
}
@@ -190,6 +225,7 @@ initSpGistState(SpGistState *state, Relation index)
/* Get cached static information about index */
cache = spgGetCache(index);
+ state->includeTupdesc = cache->includeTupdesc;
state->config = cache->config;
state->attType = cache->attType;
state->attLeafType = cache->attLeafType;
@@ -603,7 +639,7 @@ spgoptions(Datum reloptions, bool validate)
/*
* Get the space needed to store a non-null datum of the indicated type.
- * Note the result is already rounded up to a MAXALIGN boundary.
+ * Note the result is not maxaligned and this should be done by caller if needed.
* Also, we follow the SPGiST convention that pass-by-val types are
* just stored in their Datum representation (compare memcpyDatum).
*/
@@ -619,7 +655,7 @@ SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum)
else
size = VARSIZE_ANY(datum);
- return MAXALIGN(size);
+ return size;
}
/*
@@ -642,36 +678,205 @@ memcpyDatum(void *target, SpGistTypeDesc *att, Datum datum)
}
/*
- * Construct a leaf tuple containing the given heap TID and datum value
+ * Private version of heap_compute_data_size with start address not
+ * at MAXALIGN boundary. The reason is that start address (and alignment)
+ * influence alignment of each of next values and overall size of included
+ * data area in SpGiST leaf tuple. MAXALINGing first include attribute is
+ * avoided for not to introduce unnecessary gap before it.
+ */
+Size
+spgIncludedDataSize(TupleDesc tupleDesc,
+ Datum *values,
+ bool *isnull, Size start)
+{
+ Size data_length = 0;
+ int i;
+ int numberOfAttributes = tupleDesc->natts;
+
+ data_length = start;
+ for (i = 0; i < numberOfAttributes; i++)
+ {
+ Datum val;
+ Form_pg_attribute atti;
+
+ if (isnull[i])
+ continue;
+
+ val = values[i];
+ atti = TupleDescAttr(tupleDesc, i);
+
+ if (ATT_IS_PACKABLE(atti) &&
+ VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
+ {
+ /*
+ * we're anticipating converting to a short varlena header, so
+ * adjust length and don't count any alignment
+ */
+ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
+ }
+ else if (atti->attlen == -1 &&
+ VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+ {
+ /*
+ * we want to flatten the expanded value so that the constructed
+ * tuple doesn't depend on it
+ */
+ data_length = att_align_nominal(data_length, atti->attalign);
+ data_length += EOH_get_flat_size(DatumGetEOHP(val));
+ }
+ else
+ {
+ data_length = att_align_datum(data_length, atti->attalign,
+ atti->attlen, val);
+ data_length = att_addlength_datum(data_length, atti->attlen,
+ val);
+ }
+ }
+ return data_length - start;
+}
+
+/* Calculate overall leaf tuple size. SGLTHDRSZ is MAXALIGNed for backward
+ * compatibility and there might be gap between header and key data. After key
+ * data there are no such gaps more than is is necessary for each value
+ * alignment. Overall result is MAXALIGNed which is anyway unavoidable
+ * when placing a tuple on a page.
+ */
+unsigned int
+SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull)
+{
+ /* compute space needed, nullmask size and offset for include attributes */
+ unsigned int size = SGLTHDRSZ;
+ unsigned int i;
+
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+ /* nullmask size */
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ size += (state->includeTupdesc->natts / 8) + 1;
+ break;
+ }
+ }
+ /* overall included attributes size each with added proper alignment. */
+ size += spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ }
+ return MAXALIGN(size);
+}
+
+/*
+ * Construct a leaf tuple containing the given heap TID, key data and included
+ * columns data. Key data starts from MAXALIGN boundary for backward compatibility.
+ * Nullmask apply only to included attributes and is placed just after key data if
+ * there is at least one NULL among included attributes. It doesn't need alignment.
+ * Then all included columns data follow aligned by their typealign's.
*/
SpGistLeafTuple
spgFormLeafTuple(SpGistState *state, ItemPointer heapPtr,
- Datum datum, bool isnull)
+ Datum *datum, bool *isnull)
{
SpGistLeafTuple tup;
- unsigned int size;
+ unsigned int size = SGLTHDRSZ;
+ unsigned int include_offset = 0;
+ unsigned int nullmask_size = 0;
+ unsigned int data_offset = 0;
+ unsigned int data_size = 0;
+ uint16 tupmask = 0;
+ int i;
- /* compute space needed (note result is already maxaligned) */
- size = SGLTHDRSZ;
- if (!isnull)
- size += SpGistGetTypeSize(&state->attLeafType, datum);
+ /*
+ * Calculate space needed. If there are include attributes also calculate
+ * sizes and offsets needed for heap_fill_tuple
+ */
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = size;
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ nullmask_size = (state->includeTupdesc->natts / 8) + 1;
+ size += nullmask_size;
+ break;
+ }
+ }
+
+ /*
+ * Alignment of all included attributes is counted inside data_size.
+ * data_offset itself is not aligned.
+ */
+ data_size = spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ data_offset = size;
+
+ size += data_size;
+ }
/*
* Ensure that we can replace the tuple with a dead tuple later. This
- * test is unnecessary when !isnull, but let's be safe.
+ * test is unnecessary when !isnull[0], but let's be safe.
*/
if (size < SGDTSIZE)
size = SGDTSIZE;
/* OK, form the tuple */
- tup = (SpGistLeafTuple) palloc0(size);
+ tup = (SpGistLeafTuple) palloc0(MAXALIGN(size));
- tup->size = size;
- tup->nextOffset = InvalidOffsetNumber;
+ tup->size = MAXALIGN(size);
+ SGLT_SET_OFFSET(tup->nextOffset, InvalidOffsetNumber);
tup->heapPtr = *heapPtr;
- if (!isnull)
- memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum);
+ if (!isnull[0])
+ memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum[0]);
+
+ /* Add included columns data to leaf tuple if any. */
+ if (state->includeTupdesc)
+ {
+ /*
+ * The start of include attributes tuple is not aligned by default.
+ * All values alignment should be done by heap_fill_tuple
+ * automaticaly. If there is a nulls mask it is included just after
+ * key attribute data and it should not be aligned.
+ */
+ heap_fill_tuple(state->includeTupdesc, datum + 1, isnull + 1,
+ (char *) tup + data_offset,
+ data_size, &tupmask,
+ (nullmask_size ? (bits8 *) tup + include_offset : NULL));
+
+ if (nullmask_size)
+ SGLT_SET_CONTAINSNULLMASK(tup->nextOffset, 1);
+
+ /*
+ * We do this because heap_fill_tuple wants to initialize a "tupmask"
+ * which is used for HeapTuples, but the only relevant info is the
+ * "has variable attributes" field. We have already set the hasnull
+ * bit above.
+ */
+ if (tupmask & HEAP_HASVARWIDTH)
+ SGLT_SET_CONTAINSVARATT(tup->nextOffset, 1);
+ }
return tup;
}
@@ -688,10 +893,10 @@ spgFormNodeTuple(SpGistState *state, Datum label, bool isnull)
unsigned int size;
unsigned short infomask = 0;
- /* compute space needed (note result is already maxaligned) */
+ /* compute space needed */
size = SGNTHDRSZ;
if (!isnull)
- size += SpGistGetTypeSize(&state->attLabelType, label);
+ size += MAXALIGN(SpGistGetTypeSize(&state->attLabelType, label));
/*
* Here we make sure that the size will fit in the field reserved for it
@@ -735,7 +940,7 @@ spgFormInnerTuple(SpGistState *state, bool hasPrefix, Datum prefix,
/* Compute size needed */
if (hasPrefix)
- prefixSize = SpGistGetTypeSize(&state->attPrefixType, prefix);
+ prefixSize = MAXALIGN(SpGistGetTypeSize(&state->attPrefixType, prefix));
else
prefixSize = 0;
@@ -1046,3 +1251,133 @@ spgproperty(Oid index_oid, int attno,
return true;
}
+
+/*
+ * Convert an SpGist tuple into palloc'd Datum/isnull arrays.
+ *
+ */
+void
+SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state, Datum *datum, bool *isnull,
+ bool key_isnull)
+{
+ unsigned int include_offset; /* offset of include data */
+ int off;
+ bits8 *nullmask_ptr = NULL; /* ptr to null bitmap in tuple */
+ char *tp;
+ bool slow = false; /* can we use/set attcacheoff? */
+ int i;
+
+ if (key_isnull)
+ {
+ datum[0] = (Datum) 0;
+ isnull[0] = true;
+ }
+ else
+ {
+ datum[0] = SGLTDATUM(tup, state);
+ isnull[0] = false;
+ }
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = key_isnull ? SGLTHDRSZ : SGLTHDRSZ + SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ tp = (char *) tup;
+ off = include_offset;
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ nullmask_ptr = (bits8 *) tp + include_offset;
+ off += (state->includeTupdesc->natts) / 8 + 1;
+ }
+
+ if (state->attLeafType.attlen > 0 && !SGLT_GET_CONTAINSVARATT(tup->nextOffset) &&
+ !SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ /* can use attcacheoff for all attributes */
+ {
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ isnull[i] = false;
+ if (thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else
+ {
+ off = att_align_nominal(off, thisatt->attalign);
+ thisatt->attcacheoff = off;
+ }
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+ }
+ }
+ else
+
+ /*
+ * general case: can use cache until first null or varlen
+ * attribute
+ */
+ {
+ if (state->attLeafType.attlen <= 0)
+ slow = true; /* can't use attcacheoff at all */
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ if (att_isnull(i - 1, nullmask_ptr))
+ {
+ datum[i] = (Datum) 0;
+ isnull[i] = true;
+ slow = true; /* can't use attcacheoff anymore */
+ continue;
+ }
+ }
+
+ isnull[i] = false;
+
+ if (!slow && thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else if (thisatt->attlen == -1)
+ {
+ /*
+ * We can only cache the offset for a varlena attribute if
+ * the offset is already suitably aligned, so that there
+ * would be no pad bytes in any case: then the offset will
+ * be valid for either an aligned or unaligned value.
+ */
+ if (!slow && off == att_align_nominal(off, thisatt->attalign))
+ thisatt->attcacheoff = off;
+ else
+ {
+ off = att_align_pointer(off, thisatt->attalign, -1, tp + off);
+ slow = true;
+ }
+ }
+ else
+ {
+ /* not varlena, so safe to use att_align_nominal */
+ off = att_align_nominal(off, thisatt->attalign);
+
+ if (!slow)
+ thisatt->attcacheoff = off;
+ }
+
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+
+ if (thisatt->attlen <= 0)
+ slow = true; /* can't use attcacheoff anymore */
+ }
+ }
+ }
+}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index bd98707f3c..a0d76901fc 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -168,23 +168,28 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
/* Form predecessor map, too */
- if (lt->nextOffset != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) != InvalidOffsetNumber)
{
/* paranoia about corrupted chain links */
- if (lt->nextOffset < FirstOffsetNumber ||
- lt->nextOffset > max ||
- predecessor[lt->nextOffset] != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) < FirstOffsetNumber ||
+ SGLT_GET_OFFSET(lt->nextOffset) > max ||
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] != InvalidOffsetNumber)
elog(ERROR, "inconsistent tuple chain links in page %u of index \"%s\"",
BufferGetBlockNumber(buffer),
RelationGetRelationName(index));
- predecessor[lt->nextOffset] = i;
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] = i;
}
}
else if (lt->tupstate == SPGIST_REDIRECT)
{
SpGistDeadTuple dt = (SpGistDeadTuple) lt;
- Assert(dt->nextOffset == InvalidOffsetNumber);
+ /*
+ * Dead tuple nextOffset is allowed to have any values of two
+ * highest bits in case it is inherited from SpGistLeafTuple where
+ * these bits has their own meaning.
+ */
+ Assert(SGLT_GET_OFFSET(dt->nextOffset) == InvalidOffsetNumber);
Assert(ItemPointerIsValid(&dt->pointer));
/*
@@ -201,7 +206,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
else
{
- Assert(lt->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(lt->nextOffset) == InvalidOffsetNumber);
}
}
@@ -250,7 +255,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
prevLive = deletable[i] ? InvalidOffsetNumber : i;
/* scan down the chain ... */
- j = head->nextOffset;
+ j = SGLT_GET_OFFSET(head->nextOffset);
while (j != InvalidOffsetNumber)
{
SpGistLeafTuple lt;
@@ -301,7 +306,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
interveningDeletable = false;
}
- j = lt->nextOffset;
+ j = SGLT_GET_OFFSET(lt->nextOffset);
}
if (prevLive == InvalidOffsetNumber)
@@ -366,7 +371,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 7be2291d07..4022e3af07 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -122,8 +122,8 @@ spgRedoAddLeaf(XLogReaderState *record)
head = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, xldata->offnumHeadLeaf));
- Assert(head->nextOffset == leafTupleHdr.nextOffset);
- head->nextOffset = xldata->offnumLeaf;
+ Assert(SGLT_GET_OFFSET(head->nextOffset) == SGLT_GET_OFFSET(leafTupleHdr.nextOffset));
+ SGLT_SET_OFFSET(head->nextOffset, xldata->offnumLeaf);
}
}
else
@@ -822,7 +822,7 @@ spgRedoVacuumLeaf(XLogReaderState *record)
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
PageSetLSN(page, lsn);
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index 00b98ec6a0..f55549fada 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -22,7 +22,6 @@
#include "utils/geo_decls.h"
#include "utils/relcache.h"
-
typedef struct SpGistOptions
{
int32 varlena_header_; /* varlena header (do not touch directly!) */
@@ -141,6 +140,7 @@ typedef struct SpGistState
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc; /* tuple descriptor of included columns */
char *deadTupleStorage; /* workspace for spgFormDeadTuple */
@@ -148,104 +148,6 @@ typedef struct SpGistState
bool isBuild; /* true if doing index build */
} SpGistState;
-typedef struct SpGistSearchItem
-{
- pairingheap_node phNode; /* pairing heap node */
- Datum value; /* value reconstructed from parent or
- * leafValue if heaptuple */
- void *traversalValue; /* opclass-specific traverse value */
- int level; /* level of items on this page */
- ItemPointerData heapPtr; /* heap info, if heap tuple */
- bool isNull; /* SearchItem is NULL item */
- bool isLeaf; /* SearchItem is heap item */
- bool recheck; /* qual recheck is needed */
- bool recheckDistances; /* distance recheck is needed */
-
- /* array with numberOfOrderBys entries */
- double distances[FLEXIBLE_ARRAY_MEMBER];
-} SpGistSearchItem;
-
-#define SizeOfSpGistSearchItem(n_distances) \
- (offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
-
-/*
- * Private state of an index scan
- */
-typedef struct SpGistScanOpaqueData
-{
- SpGistState state; /* see above */
- pairingheap *scanQueue; /* queue of to be visited items */
- MemoryContext tempCxt; /* short-lived memory context */
- MemoryContext traversalCxt; /* single scan lifetime memory context */
-
- /* Control flags showing whether to search nulls and/or non-nulls */
- bool searchNulls; /* scan matches (all) null entries */
- bool searchNonNulls; /* scan matches (some) non-null entries */
-
- /* Index quals to be passed to opclass (null-related quals removed) */
- int numberOfKeys; /* number of index qualifier conditions */
- ScanKey keyData; /* array of index qualifier descriptors */
- int numberOfOrderBys; /* number of ordering operators */
- int numberOfNonNullOrderBys; /* number of ordering operators
- * with non-NULL arguments */
- ScanKey orderByData; /* array of ordering op descriptors */
- Oid *orderByTypes; /* array of ordering op return types */
- int *nonNullOrderByOffsets; /* array of offset of non-NULL
- * ordering keys in the original array */
- Oid indexCollation; /* collation of index column */
-
- /* Opclass defined functions: */
- FmgrInfo innerConsistentFn;
- FmgrInfo leafConsistentFn;
-
- /* Pre-allocated workspace arrays: */
- double *zeroDistances;
- double *infDistances;
-
- /* These fields are only used in amgetbitmap scans: */
- TIDBitmap *tbm; /* bitmap being filled */
- int64 ntids; /* number of TIDs passed to bitmap */
-
- /* These fields are only used in amgettuple scans: */
- bool want_itup; /* are we reconstructing tuples? */
- TupleDesc indexTupDesc; /* if so, tuple descriptor for them */
- int nPtrs; /* number of TIDs found on current page */
- int iPtr; /* index for scanning through same */
- ItemPointerData heapPtrs[MaxIndexTuplesPerPage]; /* TIDs from cur page */
- bool recheck[MaxIndexTuplesPerPage]; /* their recheck flags */
- bool recheckDistances[MaxIndexTuplesPerPage]; /* distance recheck
- * flags */
- HeapTuple reconTups[MaxIndexTuplesPerPage]; /* reconstructed tuples */
-
- /* distances (for recheck) */
- IndexOrderByDistance *distances[MaxIndexTuplesPerPage];
-
- /*
- * Note: using MaxIndexTuplesPerPage above is a bit hokey since
- * SpGistLeafTuples aren't exactly IndexTuples; however, they are larger,
- * so this is safe.
- */
-} SpGistScanOpaqueData;
-
-typedef SpGistScanOpaqueData *SpGistScanOpaque;
-
-/*
- * This struct is what we actually keep in index->rd_amcache. It includes
- * static configuration information as well as the lastUsedPages cache.
- */
-typedef struct SpGistCache
-{
- spgConfigOut config; /* filled in by opclass config method */
-
- SpGistTypeDesc attType; /* type of values to be indexed/restored */
- SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
- SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
- SpGistTypeDesc attLabelType; /* type of node label values */
-
- SpGistLUPCache lastUsedPages; /* local storage of last-used info */
-} SpGistCache;
-
-
/*
* SPGiST tuple types. Note: inner, leaf, and dead tuple structs
* must have the same tupstate field in the same position! Real inner and
@@ -305,8 +207,8 @@ typedef SpGistInnerTupleData *SpGistInnerTuple;
* SPGiST node tuple: one node within an inner tuple
*
* Node tuples use the same header as ordinary Postgres IndexTuples, but
- * we do not use a null bitmap, because we know there is only one column
- * so the INDEX_NULL_MASK bit suffices. Also, pass-by-value datums are
+ * we do not use a null bitmap, because we know there is only one key column
+ * so the INDEX_NULL_MASK bit suffices. Also, pass-by-value datums are
* stored as a full Datum, the same convention as for inner tuple prefixes
* and leaf tuple datums.
*/
@@ -322,11 +224,13 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
PointerGetDatum(SGNTDATAPTR(x)))
/*
- * SPGiST leaf tuple: carries a datum and a heap tuple TID
+ * SPGiST leaf tuple: carries a key datum, a heap tuple TID and optional
+ * datums and nullmask of included columns.
*
- * In the simplest case, the datum is the same as the indexed value; but
+ * In the simplest case, the key datum is the same as the indexed value; but
* it could also be a suffix or some other sort of delta that permits
* reconstruction given knowledge of the prefix path traversed to get here.
+ * Datums of included columns are stored without modification.
*
* The size field is wider than could possibly be needed for an on-disk leaf
* tuple, but this allows us to form leaf tuples even when the datum is too
@@ -346,14 +250,44 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
* however, the SGDTSIZE limit ensures that's there's a Datum word there
* anyway, so SGLTDATUM can be applied safely as long as you don't do
* anything with the result.
+ *
+ * Minimum space to store SpGistLeafTuple plus ItemIdData on a page is 16 bytes,
+ * so 14 lower bits of nextOffset is enough to store tuple number in a chain
+ * on a page even if page size is 64Kb. Two higher bits are to store per-tuple
+ * information for included attributes: is there nulls mask exist, and is there
+ * any included attribute of variable length type. If there are no included
+ * columns these higher bits are not used.
+ *
+ * If there are included columns, they are stored after a key value each starting
+ * from its own typalign boundary. Unlike IndexTuple, first included value does
+ * not need to be stored, starting from MAXALIGN boundary, and SPGiST uses
+ * private routines to access them. Nullmask with size of
+ * (number of included columns)/8 bytes is put without alignment between key
+ * and first included column. If there is an alignment gap between them,
+ * nullmask has a good chance to fit into the gap, thus making its storage free of
+ * charge.
*/
+
typedef struct SpGistLeafTupleData
{
unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
size:30; /* large enough for any palloc'able value */
- OffsetNumber nextOffset; /* next tuple in chain, or InvalidOffsetNumber */
+
+ /* ---------------
+ * nextOffset is laid out in the following fashion:
+ *
+ * 15th (high) bit: included values has nulls
+ * 14th bit: included values has var-length attributes
+ * 13-0 bit: number of next tuple in chain on a page, or InvalidOffsetNumber
+ * ---------------
+ */
+
+ unsigned short nextOffset; /* info for linking tuples in a chain on a leaf page,
+ and additional info for included attributes */
ItemPointerData heapPtr; /* TID of represented heap tuple */
- /* leaf datum follows */
+ /* key column data follows */
+ /* nullmask of included values follows if there are nulls in included attributes*/
+ /* included columns data follow if any */
} SpGistLeafTupleData;
typedef SpGistLeafTupleData *SpGistLeafTuple;
@@ -361,8 +295,17 @@ typedef SpGistLeafTupleData *SpGistLeafTuple;
#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
- *(Datum *) SGLTDATAPTR(x) : \
- PointerGetDatum(SGLTDATAPTR(x)))
+ *(Datum *) SGLTDATAPTR(x) : \
+ PointerGetDatum(SGLTDATAPTR(x)))
+/*
+ * Macros to access bit fields inside nextOffset independently.
+ */
+#define SGLT_GET_OFFSET(x) ( (x) & 0x3FFF )
+#define SGLT_GET_CONTAINSNULLMASK(x) ( (x) >> 15 )
+#define SGLT_GET_CONTAINSVARATT(x) ( ( (x) & 0x4000 ) >> 14 )
+#define SGLT_SET_OFFSET(x,o) ( (x) = ( (x) & 0xC000 ) | ( (o) & 0x3FFF) )
+#define SGLT_SET_CONTAINSNULLMASK(x,n) ( (x) = ( (n) << 15 ) | ( (x) & 0x3FFF ) )
+#define SGLT_SET_CONTAINSVARATT(x,v) ( (x) = ( (v) << 14 ) | ( (x) & 0xBFFF ) )
/*
* SPGiST dead tuple: declaration for examining non-live tuples
@@ -373,7 +316,6 @@ typedef SpGistLeafTupleData *SpGistLeafTuple;
* field, to satisfy some Asserts that we make when replacing a leaf tuple
* with a dead tuple.
* We don't use nextOffset, but it's needed to align the pointer field.
- * pointer and xid are only valid when tupstate = REDIRECT.
*/
typedef struct SpGistDeadTupleData
{
@@ -394,7 +336,6 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
* size plus sizeof(ItemIdData) (for the line pointer). This works correctly
* so long as tuple sizes are always maxaligned.
*/
-
/* Page capacity after allowing for fixed header and special space */
#define SPGIST_PAGE_CAPACITY \
MAXALIGN_DOWN(BLCKSZ - \
@@ -410,6 +351,105 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
Min(SpGistPageGetOpaque(p)->nPlaceholder, n) * \
(SGDTSIZE + sizeof(ItemIdData)))
+
+typedef struct SpGistSearchItem
+{
+ pairingheap_node phNode; /* pairing heap node */
+ Datum value; /* value reconstructed from parent or
+ * leafValue if heaptuple */
+ void *traversalValue; /* opclass-specific traverse value */
+ int level; /* level of items on this page */
+ ItemPointerData heapPtr; /* heap info, if heap tuple */
+ bool isNull; /* SearchItem is NULL item */
+ bool isLeaf; /* SearchItem is heap item */
+ bool recheck; /* qual recheck is needed */
+ bool recheckDistances; /* distance recheck is needed */
+ SpGistLeafTuple leafTuple;
+ /* array with numberOfOrderBys entries */
+ double distances[FLEXIBLE_ARRAY_MEMBER];
+ /* if there are include columns SpGistLeafTupleData follow */
+} SpGistSearchItem;
+
+#define SizeOfSpGistSearchItem(n_distances) \
+ (offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
+/*
+ * Private state of an index scan
+ */
+typedef struct SpGistScanOpaqueData
+{
+ SpGistState state; /* see above */
+ pairingheap *scanQueue; /* queue of to be visited items */
+ MemoryContext tempCxt; /* short-lived memory context */
+ MemoryContext traversalCxt; /* single scan lifetime memory context */
+
+ /* Control flags showing whether to search nulls and/or non-nulls */
+ bool searchNulls; /* scan matches (all) null entries */
+ bool searchNonNulls; /* scan matches (some) non-null entries */
+
+ /* Index quals to be passed to opclass (null-related quals removed) */
+ int numberOfKeys; /* number of index qualifier conditions */
+ ScanKey keyData; /* array of index qualifier descriptors */
+ int numberOfOrderBys; /* number of ordering operators */
+ int numberOfNonNullOrderBys; /* number of ordering operators
+ * with non-NULL arguments */
+ ScanKey orderByData; /* array of ordering op descriptors */
+ Oid *orderByTypes; /* array of ordering op return types */
+ int *nonNullOrderByOffsets; /* array of offset of non-NULL
+ * ordering keys in the original array */
+ Oid indexCollation; /* collation of index column */
+
+ /* Opclass defined functions: */
+ FmgrInfo innerConsistentFn;
+ FmgrInfo leafConsistentFn;
+
+ /* Pre-allocated workspace arrays: */
+ double *zeroDistances;
+ double *infDistances;
+
+ /* These fields are only used in amgetbitmap scans: */
+ TIDBitmap *tbm; /* bitmap being filled */
+ int64 ntids; /* number of TIDs passed to bitmap */
+
+ /* These fields are only used in amgettuple scans: */
+ bool want_itup; /* are we reconstructing tuples? */
+ TupleDesc indexTupDesc; /* if so, tuple descriptor for them */
+ int nPtrs; /* number of TIDs found on current page */
+ int iPtr; /* index for scanning through same */
+ ItemPointerData heapPtrs[MaxIndexTuplesPerPage]; /* TIDs from cur page */
+ bool recheck[MaxIndexTuplesPerPage]; /* their recheck flags */
+ bool recheckDistances[MaxIndexTuplesPerPage]; /* distance recheck
+ * flags */
+ HeapTuple reconTups[MaxIndexTuplesPerPage]; /* reconstructed tuples */
+
+ /* distances (for recheck) */
+ IndexOrderByDistance *distances[MaxIndexTuplesPerPage];
+
+ /*
+ * Note: using MaxIndexTuplesPerPage above is a bit hokey since
+ * SpGistLeafTuples aren't exactly IndexTuples; however, they are larger,
+ * so this is safe.
+ */
+} SpGistScanOpaqueData;
+
+typedef SpGistScanOpaqueData *SpGistScanOpaque;
+
+/*
+ * This struct is what we actually keep in index->rd_amcache. It includes
+ * static configuration information as well as the lastUsedPages cache.
+ */
+typedef struct SpGistCache
+{
+ spgConfigOut config; /* filled in by opclass config method */
+
+ SpGistTypeDesc attType; /* type of values to be indexed/restored */
+ SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
+ SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
+ SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc;
+
+ SpGistLUPCache lastUsedPages; /* local storage of last-used info */
+} SpGistCache;
+
/*
* XLOG stuff
*/
@@ -456,9 +496,10 @@ extern void SpGistInitPage(Page page, uint16 f);
extern void SpGistInitBuffer(Buffer b, uint16 f);
extern void SpGistInitMetapage(Page page);
extern unsigned int SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum);
+extern unsigned int SpgLeafSize(SpGistState *state, Datum *datum, bool *isnull);
extern SpGistLeafTuple spgFormLeafTuple(SpGistState *state,
ItemPointer heapPtr,
- Datum datum, bool isnull);
+ Datum *datum, bool *isnull);
extern SpGistNodeTuple spgFormNodeTuple(SpGistState *state,
Datum label, bool isnull);
extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
@@ -466,6 +507,8 @@ extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
int nNodes, SpGistNodeTuple *nodes);
extern SpGistDeadTuple spgFormDeadTuple(SpGistState *state, int tupstate,
BlockNumber blkno, OffsetNumber offnum);
+extern void SpGistDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state,
+ Datum *datum, bool *isnull, bool key_value_isnull);
extern Datum *spgExtractNodeLabels(SpGistState *state,
SpGistInnerTuple innerTuple);
extern OffsetNumber SpGistPageAddNewItem(SpGistState *state, Page page,
@@ -484,7 +527,7 @@ extern void spgPageIndexMultiDelete(SpGistState *state, Page page,
int firststate, int reststate,
BlockNumber blkno, OffsetNumber offnum);
extern bool spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull);
+ ItemPointer heapPtr, Datum *datum, bool *isnull);
/* spgproc.c */
extern double *spg_key_orderbys_distances(Datum key, bool isLeaf,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index d92a6d12c6..93e6a43b6d 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -169,9 +169,9 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
hash | bogus |
spgist | can_order | f
spgist | can_unique | f
- spgist | can_multi_col | f
+ spgist | can_multi_col | t
spgist | can_exclude | t
- spgist | can_include | f
+ spgist | can_include | t
spgist | bogus |
(36 rows)
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..4fd2b7e878 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -356,7 +356,6 @@ CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "brin" does not support included columns
CREATE INDEX on tbl USING gist(c3) INCLUDE (c1, c4);
CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
-ERROR: access method "spgist" does not support included columns
CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "gin" does not support included columns
CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
diff --git a/src/test/regress/expected/index_including_spgist.out b/src/test/regress/expected/index_including_spgist.out
new file mode 100644
index 0000000000..fa64766fb7
--- /dev/null
+++ b/src/test/regress/expected/index_including_spgist.out
@@ -0,0 +1,139 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+DROP TABLE tbl_spgist;
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+----------
+(0 rows)
+
+DROP TABLE tbl_spgist;
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+ Table "public.tbl_spgist"
+ Column | Type | Collation | Nullable | Default
+--------+---------+-----------+----------+---------
+ c1 | bigint | | |
+ c2 | integer | | |
+ c3 | bigint | | |
+ c4 | box | | |
+Indexes:
+ "tbl_spgist_idx" spgist (c4) INCLUDE (c1, c3)
+
+DROP TABLE tbl_spgist;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..985458a1a8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -50,7 +50,7 @@ test: copy copyselect copydml insert insert_conflict
# ----------
test: create_misc create_operator create_procedure
# These depend on create_misc and create_operator
-test: create_index create_index_spgist create_view index_including index_including_gist
+test: create_index create_index_spgist create_view index_including index_including_gist index_including_spgist
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..f3df961535 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -68,6 +68,7 @@ test: create_index_spgist
test: create_view
test: index_including
test: index_including_gist
+test: index_including_spgist
test: create_aggregate
test: create_function_3
test: create_cast
diff --git a/src/test/regress/sql/index_including_spgist.sql b/src/test/regress/sql/index_including_spgist.sql
new file mode 100644
index 0000000000..c47f713d25
--- /dev/null
+++ b/src/test/regress/sql/index_including_spgist.sql
@@ -0,0 +1,80 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+DROP TABLE tbl_spgist;
--
2.28.0
On 24.08.2020 16:19, Pavel Borisov wrote:
I added some extra comments and mentions in manual to make all the
things clear (see v7 patch)
The patch implements the proposed functionality, passes tests, and in
general looks good to me.
I don't mind using a macro to differentiate tuples with and without
included attributes. Any approach will require code changes. Though, I
don't have a strong opinion about that.
A bit of nitpicking:
1) You mention backward compatibility in some comments. But, after this
patch will be committed, it will be uneasy to distinct new and old
phrases. So I suggest to rephrase them. You can either refer a
specific version or just call it "compatibility with indexes without
included attributes".
2) SpgLeafSize() function name seems misleading, as it actually refers
to a tuple's size, not a leaf page. I suggest to rename it to
SpgLeafTupleSize().
3) I didn't quite get the meaning of the assertion, that is added in a
few places:
Assert(so->state.includeTupdesc->natts);
Should it be Assert(so->state.includeTupdesc->natts > 1) ?
4) There are a few typos in comments and docs:
s/colums/columns
s/include attribute/included attribute
and so on.
5) This comment in index_including.sql is outdated:
* 7. Check various AMs. All but btree and gist must fail.
6) New test lacks SET enable_seqscan TO off;
in addition to SET enable_bitmapscan TO off;
I also wonder, why both index_including_spgist.sql and
index_including.sql tests are stable without running 'vacuum analyze'
before the EXPLAIN that shows Index Only Scan plan. Is autovacuum just
always fast enough to fill a visibility map, or I miss something?
--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
3) I didn't quite get the meaning of the assertion, that is added in a few
places:
Assert(so->state.includeTupdesc->natts);
Should it be Assert(so->state.includeTupdesc->natts > 1) ?
It is rather Assert(so->state.includeTupdesc->natts > 0) as INCLUDE tuple
descriptor should not be initialized and filled in case of index without
INCLUDE attributes and doesn't contain any info about key attribute which
is processed by SpGist existing way separately for different SpGist tuple
types i.e. leaf, prefix=inner and label tuples. So only INCLUDE attributes
are counted there. This and similar Asserts are for the case includeTupdesc
becomes mistakenly initialized by some future code change.
I completely agree with all the other suggestions and made corrections (see
v8). Thank you very much for your review!
Also there is a separate patch 0002 to add VACUUM ANALYZE to
index_including test which is not necessary for covering spgist.
One more point to note: in spgist_private.h I needed to shift down whole
block between
*"typedef struct SpGistSearchItem"*
*and *
*"} SpGistCache;"*
to position it below tuples types declarations to insert pointer
"SpGistLeafTuple leafTuple"; into struct SpGistSearchItem. This is the only
change in this block and I apologize for possible inconvenience to review
this change.
--
Best regards,
Pavel Borisov
Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>
Attachments:
v8-0001-Covering-SP-GiST-index-support-for-INCLUDE-column.patchapplication/octet-stream; name=v8-0001-Covering-SP-GiST-index-support-for-INCLUDE-column.patchDownload
From 0f8140f874db2a5786ccbc3a67b3910722f59f5b Mon Sep 17 00:00:00 2001
From: Pavel Borisov <pashkin.elfe@gmail.com>
Date: Thu, 27 Aug 2020 19:37:44 +0400
Subject: [PATCH v8] Covering SP-GiST index - support for INCLUDE columns
Adding INCLUDE columns for SPGiST index is intended to increase the speed of queries by making scans index-only likewise
in btree and GiST index. These columns are added only to leaf tuples and they are not used in index tree search but they
can be fetched during index scan.
The other point of INCLUDE columns is to overcome SP-GiST limitation of being single-column in principle. I.e. in certain
cases a single covering SP-GiST index can replace several separate ones with less disk space and shared buffers
consumption, faster, update etc. Also, any data types without SP-GiST supported opclasses can be included.
Discussion: https://www.postgresql.org/message-id/flat/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com
---
doc/src/sgml/indices.sgml | 4 +-
doc/src/sgml/ref/create_index.sgml | 4 +-
doc/src/sgml/spgist.sgml | 8 +
src/backend/access/spgist/README | 21 +-
src/backend/access/spgist/spgdoinsert.c | 172 +++++---
src/backend/access/spgist/spginsert.c | 5 +-
src/backend/access/spgist/spgscan.c | 87 +++-
src/backend/access/spgist/spgutils.c | 389 ++++++++++++++++--
src/backend/access/spgist/spgvacuum.c | 25 +-
src/backend/access/spgist/spgxlog.c | 6 +-
src/include/access/spgist_private.h | 263 +++++++-----
src/test/regress/expected/amutils.out | 4 +-
src/test/regress/expected/index_including.out | 3 +-
.../expected/index_including_spgist.out | 143 +++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/index_including.sql | 2 +-
.../regress/sql/index_including_spgist.sql | 84 ++++
18 files changed, 982 insertions(+), 241 deletions(-)
create mode 100644 src/test/regress/expected/index_including_spgist.out
create mode 100644 src/test/regress/sql/index_including_spgist.sql
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 28adaba72d..c89cc6cb08 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1194,8 +1194,8 @@ CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);
likely to not need to access the heap. If the heap tuple must be visited
anyway, it costs nothing more to get the column's value from there.
Other restrictions are that expressions are not currently supported as
- included columns, and that only B-tree and GiST indexes currently support
- included columns.
+ included columns, and that only B-tree, GiST and SP-GiST indexes currently
+ support included columns.
</para>
<para>
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index ff87b2d28f..3d360bcf47 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -187,8 +187,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
</para>
<para>
- Currently, the B-tree and the GiST index access methods support this
- feature. In B-tree and the GiST indexes, the values of columns listed
+ Currently, the B-tree, GiST and SP-GiST index access methods support
+ this feature. In these indexes, the values of columns listed
in the <literal>INCLUDE</literal> clause are included in leaf tuples
which correspond to heap tuples, but are not included in upper-level
index entries used for tree navigation.
diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index 0e04a08679..868a140a6a 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -240,6 +240,14 @@
inner tuples that are passed through to reach the leaf level.
</para>
+ <para>
+ In case when <acronym>SP-GiST</acronym> index is created with
+ <literal>INCLUDE</literal> clause i.e. covering index, leaf tuples also
+ contain data from included columns. This data is stored uncompressed and can have
+ data types without any SP-GiST operator class.
+
+ </para>
+
<para>
Inner tuples are more complex, since they are branching points in the
search tree. Each inner tuple contains a set of one or more
diff --git a/src/backend/access/spgist/README b/src/backend/access/spgist/README
index b55b073832..55b515f03d 100644
--- a/src/backend/access/spgist/README
+++ b/src/backend/access/spgist/README
@@ -73,9 +73,22 @@ Leaf tuple consists of:
Example:
radix tree - the rest of string (postfix)
quad and k-d tree - the point itself
-
ItemPointer to the heap
-
+ nextOffset number of next leaf tuple in a chain on a leaf page
+ optional nullmask for INCLUDE columns
+ optional INCLUDE columns values
+
+Leaf tuple layout changed since PostgreSQL version 14 to support INCLUDE
+columns but in a way that doesn't change the header and the key value
+placement in a tuple. So indexes created earlier remain fully supported.
+
+Also it is intended to be laid out with minimum possible gaps to make index
+smaller. I.e. first header of 12 bytes, then a key value starting from
+maxalign boundary, then just immediately nulls mask bytes, then INCLUDE
+attributes each starting from its typealign boundary. So in many cases,
+nullmask is stored free of charge and tuple occupy minimum possible space
+(with exception of gap before key value which starts from maxalign for
+compatibility).
NULLS HANDLING
@@ -90,6 +103,10 @@ Insertions and searches in the nulls tree do not use any of the
opclass-supplied functions, but just use hardwired logic comparable to
AllTheSame cases in the normal tree.
+For INCLUDE attributes nulls are handled in ordinary per leaf-tuple way i.e.
+if null mask presence bit in a header is set, nullmask is added just after
+key value before the first INCLUDE attribute. Note that nullmask presence
+bit and nullmask itself apply only to INCLUDE attributes.
INSERTION ALGORITHM
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 934d65b89f..a5994c7100 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -22,7 +22,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
-
+#include "access/htup_details.h"
/*
* SPPageDesc tracks all info about a page we are inserting into. In some
@@ -220,7 +220,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
SpGistBlockIsRoot(current->blkno))
{
/* Tuple is not part of a chain */
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
current->offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -253,7 +253,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
PageGetItemId(current->page, current->offnum));
if (head->tupstate == SPGIST_LIVE)
{
- leafTuple->nextOffset = head->nextOffset;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, SGLT_GET_OFFSET(head->nextOffset));
offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -264,14 +264,14 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
*/
head = (SpGistLeafTuple) PageGetItem(current->page,
PageGetItemId(current->page, current->offnum));
- head->nextOffset = offnum;
+ SGLT_SET_OFFSET(head->nextOffset, offnum);
xlrec.offnumLeaf = offnum;
xlrec.offnumHeadLeaf = current->offnum;
}
else if (head->tupstate == SPGIST_DEAD)
{
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
PageIndexTupleDelete(current->page, current->offnum);
if (PageAddItem(current->page,
(Item) leafTuple, leafTuple->size,
@@ -362,13 +362,13 @@ checkSplitConditions(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* Don't count it in result, because it won't go to other page */
}
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
*nToSplit = n;
@@ -437,7 +437,7 @@ moveLeafs(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
/* We don't want to move it, so don't count it in size */
toDelete[nDelete] = i;
nDelete++;
@@ -446,7 +446,7 @@ moveLeafs(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
/* Find a leaf page that will hold them */
@@ -475,7 +475,7 @@ moveLeafs(Relation index, SpGistState *state,
* don't care). We're modifying the tuple on the source page
* here, but it's okay since we're about to delete it.
*/
- it->nextOffset = r;
+ SGLT_SET_OFFSET(it->nextOffset, r);
r = SpGistPageAddNewItem(state, npage, (Item) it, it->size,
&startOffset, false);
@@ -490,7 +490,7 @@ moveLeafs(Relation index, SpGistState *state,
}
/* add the new tuple as well */
- newLeafTuple->nextOffset = r;
+ SGLT_SET_OFFSET(newLeafTuple->nextOffset, r);
r = SpGistPageAddNewItem(state, npage,
(Item) newLeafTuple, newLeafTuple->size,
&startOffset, false);
@@ -709,6 +709,9 @@ doPickSplit(Relation index, SpGistState *state,
int nToDelete,
nToInsert,
maxToInclude;
+ Datum *leafChainDatums;
+ bool *leafChainIsnulls;
+ const int natts = IndexRelationGetNumberOfAttributes(index);
in.level = level;
@@ -723,14 +726,16 @@ doPickSplit(Relation index, SpGistState *state,
toInsert = (OffsetNumber *) palloc(sizeof(OffsetNumber) * n);
newLeafs = (SpGistLeafTuple *) palloc(sizeof(SpGistLeafTuple) * n);
leafPageSelect = (uint8 *) palloc(sizeof(uint8) * n);
-
STORE_STATE(state, xlrec.stateSrc);
+ leafChainDatums = (Datum *) palloc(n * natts * sizeof(Datum));
+ leafChainIsnulls = (bool *) palloc(n * natts * sizeof(bool));
+
/*
- * Form list of leaf tuples which will be distributed as split result;
- * also, count up the amount of space that will be freed from current.
- * (Note that in the non-root case, we won't actually delete the old
- * tuples, only replace them with redirects or placeholders.)
+ * Collect leaf tuples which will be distributed as split result; also,
+ * count up the amount of space that will be freed from current. (Note
+ * that in the non-root case, we won't actually delete the old tuples,
+ * only replace them with redirects or placeholders.)
*
* Note: the SGLTDATUM calls here are safe even when dealing with a nulls
* page. For a pass-by-value data type we will fetch a word that must
@@ -738,7 +743,15 @@ doPickSplit(Relation index, SpGistState *state,
* tuples must have size at least SGDTSIZE). For a pass-by-reference type
* we are just computing a pointer that isn't going to get dereferenced.
* So it's not worth guarding the calls with isNulls checks.
+ *
+ * Datums and isnulls of all leaf tuple attributes in the chain are
+ * collected into 2-d arrays: (number of tuples in the chain) x (number of
+ * attributes) The first attribute is key, the other - INCLUDE attributes (if
+ * any). After picksplit we need to form new leaf tuples as the key attribute
+ * length can change which can affect the alignment of every INCLUDE
+ * attribute.
*/
+
nToInsert = 0;
nToDelete = 0;
spaceToDelete = 0;
@@ -759,6 +772,8 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+ spgDeformLeafTuple(it, state, leafChainDatums + nToInsert * natts,
+ leafChainIsnulls + nToInsert * natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -784,6 +799,9 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+
+ spgDeformLeafTuple(it, state, leafChainDatums + nToInsert * natts,
+ leafChainIsnulls + nToInsert * natts, isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -795,7 +813,7 @@ doPickSplit(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it->nextOffset) == InvalidOffsetNumber);
toDelete[nToDelete] = i;
nToDelete++;
/* replacing it with redirect will save no space */
@@ -803,7 +821,7 @@ doPickSplit(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it->nextOffset);
}
}
in.nTuples = nToInsert;
@@ -816,10 +834,17 @@ doPickSplit(Relation index, SpGistState *state,
*/
in.datums[in.nTuples] = SGLTDATUM(newLeafTuple, state);
heapPtrs[in.nTuples] = newLeafTuple->heapPtr;
+
+ spgDeformLeafTuple(newLeafTuple, state, leafChainDatums + (in.nTuples) * natts,
+ leafChainIsnulls + (in.nTuples) * natts, isNulls);
in.nTuples++;
memset(&out, 0, sizeof(out));
+ /*
+ * Process collected key values of tuples from the chain. Included values
+ * are used to build fresh leaf tuples unchanged.
+ */
if (!isNulls)
{
/*
@@ -837,9 +862,11 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- out.leafTupleDatums[i],
- false);
+ *(leafChainDatums + i * natts) = (Datum) out.leafTupleDatums[i];
+ *(leafChainIsnulls + i * natts) = false;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums + i * natts,
+ leafChainIsnulls + i * natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -860,9 +887,14 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
- newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- (Datum) 0,
- true);
+ /*
+ * Nulls tree can contain only null key values.
+ */
+ *(leafChainDatums + i * natts) = (Datum) 0;
+ *(leafChainIsnulls + i * natts) = true;
+
+ newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i, leafChainDatums + i * natts,
+ leafChainIsnulls + i * natts);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -1196,10 +1228,10 @@ doPickSplit(Relation index, SpGistState *state,
if (ItemPointerIsValid(&nodes[n]->t_tid))
{
Assert(ItemPointerGetBlockNumber(&nodes[n]->t_tid) == leafBlock);
- it->nextOffset = ItemPointerGetOffsetNumber(&nodes[n]->t_tid);
+ SGLT_SET_OFFSET(it->nextOffset, ItemPointerGetOffsetNumber(&nodes[n]->t_tid));
}
else
- it->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(it->nextOffset, InvalidOffsetNumber);
/* Insert it on page */
newoffset = SpGistPageAddNewItem(state, BufferGetPage(leafBuffer),
@@ -1889,67 +1921,83 @@ spgSplitNodeAction(Relation index, SpGistState *state,
*/
bool
spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull)
+ ItemPointer heapPtr, Datum *datum, bool *isnull)
{
int level = 0;
- Datum leafDatum;
+ Datum *leafDatum;
int leafSize;
SPPageDesc current,
parent;
FmgrInfo *procinfo = NULL;
+ int i;
/*
* Look up FmgrInfo of the user-defined choose function once, to save
* cycles in the loop below.
*/
- if (!isnull)
+ if (!isnull[0])
procinfo = index_getprocinfo(index, 1, SPGIST_CHOOSE_PROC);
/*
* Prepare the leaf datum to insert.
- *
+ */
+
+ leafDatum = (Datum *) palloc0(sizeof(Datum) * (IndexRelationGetNumberOfAttributes(index)));
+
+ /*
* If an optional "compress" method is provided, then call it to form the
- * leaf datum from the input datum. Otherwise store the input datum as
- * is. Since we don't use index_form_tuple in this AM, we have to make
- * sure value to be inserted is not toasted; FormIndexDatum doesn't
- * guarantee that. But we assume the "compress" method to return an
- * untoasted value.
+ * key datum from the input datum. Otherwise, store the input datum as is.
+ * Since we don't use index_form_tuple in this AM, we have to make sure
+ * value to be inserted is not toasted; FormIndexDatum doesn't guarantee
+ * that. But we assume the "compress" method to return an untoasted
+ * value.
*/
- if (!isnull)
+ if (!isnull[0])
{
if (OidIsValid(index_getprocid(index, 1, SPGIST_COMPRESS_PROC)))
{
FmgrInfo *compressProcinfo = NULL;
compressProcinfo = index_getprocinfo(index, 1, SPGIST_COMPRESS_PROC);
- leafDatum = FunctionCall1Coll(compressProcinfo,
- index->rd_indcollation[0],
- datum);
+ leafDatum[0] = FunctionCall1Coll(compressProcinfo,
+ index->rd_indcollation[0],
+ datum[0]);
}
else
{
Assert(state->attLeafType.type == state->attType.type);
if (state->attType.attlen == -1)
- leafDatum = PointerGetDatum(PG_DETOAST_DATUM(datum));
+ leafDatum[0] = PointerGetDatum(PG_DETOAST_DATUM(datum[0]));
else
- leafDatum = datum;
+ leafDatum[0] = datum[0];
}
}
else
- leafDatum = (Datum) 0;
+ leafDatum[0] = (Datum) 0;
+
+ for (i = 1; i < IndexRelationGetNumberOfAttributes(index); i++)
+ {
+ if (!isnull[i])
+ {
+ if (TupleDescAttr(state->includeTupdesc, i - 1)->attlen == -1)
+ leafDatum[i] = PointerGetDatum(PG_DETOAST_DATUM(datum[i]));
+ else
+ leafDatum[i] = datum[i];
+ }
+ else
+ leafDatum[i] = (Datum) 0;
+ }
+
/*
- * Compute space needed for a leaf tuple containing the given datum.
+ * Compute space needed on a page for a leaf tuple containing the given
+ * datum.
*
* If it isn't gonna fit, and the opclass can't reduce the datum size by
* suffixing, bail out now rather than getting into an endless loop.
*/
- if (!isnull)
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
- else
- leafSize = SGDTSIZE + sizeof(ItemIdData);
+ leafSize = spgLeafTupleSize(state, leafDatum, isnull) + sizeof(ItemIdData);
if (leafSize > SPGIST_PAGE_CAPACITY && !state->config.longValuesOK)
ereport(ERROR,
@@ -1961,7 +2009,7 @@ spgdoinsert(Relation index, SpGistState *state,
errhint("Values larger than a buffer page cannot be indexed.")));
/* Initialize "current" to the appropriate root page */
- current.blkno = isnull ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
+ current.blkno = isnull[0] ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
current.buffer = InvalidBuffer;
current.page = NULL;
current.offnum = FirstOffsetNumber;
@@ -1995,7 +2043,7 @@ spgdoinsert(Relation index, SpGistState *state,
*/
current.buffer =
SpGistGetBuffer(index,
- GBUF_LEAF | (isnull ? GBUF_NULLS : 0),
+ GBUF_LEAF | (isnull[0] ? GBUF_NULLS : 0),
Min(leafSize, SPGIST_PAGE_CAPACITY),
&isNew);
current.blkno = BufferGetBlockNumber(current.buffer);
@@ -2037,7 +2085,7 @@ spgdoinsert(Relation index, SpGistState *state,
current.page = BufferGetPage(current.buffer);
/* should not arrive at a page of the wrong type */
- if (isnull ? !SpGistPageStoresNulls(current.page) :
+ if (isnull[0] ? !SpGistPageStoresNulls(current.page) :
SpGistPageStoresNulls(current.page))
elog(ERROR, "SPGiST index page %u has wrong nulls flag",
current.blkno);
@@ -2054,7 +2102,7 @@ spgdoinsert(Relation index, SpGistState *state,
{
/* it fits on page, so insert it and we're done */
addLeafTuple(index, state, leafTuple,
- ¤t, &parent, isnull, isNew);
+ ¤t, &parent, isnull[0], isNew);
break;
}
else if ((sizeToSplit =
@@ -2068,14 +2116,14 @@ spgdoinsert(Relation index, SpGistState *state,
* chain to another leaf page rather than splitting it.
*/
Assert(!isNew);
- moveLeafs(index, state, ¤t, &parent, leafTuple, isnull);
+ moveLeafs(index, state, ¤t, &parent, leafTuple, isnull[0]);
break; /* we're done */
}
else
{
/* picksplit */
if (doPickSplit(index, state, ¤t, &parent,
- leafTuple, level, isnull, isNew))
+ leafTuple, level, isnull[0], isNew))
break; /* doPickSplit installed new tuples */
/* leaf tuple will not be inserted yet */
@@ -2110,8 +2158,8 @@ spgdoinsert(Relation index, SpGistState *state,
innerTuple = (SpGistInnerTuple) PageGetItem(current.page,
PageGetItemId(current.page, current.offnum));
- in.datum = datum;
- in.leafDatum = leafDatum;
+ in.datum = datum[0];
+ in.leafDatum = leafDatum[0];
in.level = level;
in.allTheSame = innerTuple->allTheSame;
in.hasPrefix = (innerTuple->prefixSize > 0);
@@ -2121,7 +2169,7 @@ spgdoinsert(Relation index, SpGistState *state,
memset(&out, 0, sizeof(out));
- if (!isnull)
+ if (!isnull[0])
{
/* use user-defined choose method */
FunctionCall2Coll(procinfo,
@@ -2158,11 +2206,11 @@ spgdoinsert(Relation index, SpGistState *state,
/* Adjust level as per opclass request */
level += out.result.matchNode.levelAdd;
/* Replace leafDatum and recompute leafSize */
- if (!isnull)
+ if (!isnull[0])
{
- leafDatum = out.result.matchNode.restDatum;
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
+ leafDatum[0] = out.result.matchNode.restDatum;
+ leafSize = spgLeafTupleSize(state, leafDatum, isnull) +
+ sizeof(ItemIdData);
}
/*
@@ -2227,6 +2275,6 @@ spgdoinsert(Relation index, SpGistState *state,
SpGistSetLastUsedPage(index, parent.buffer);
UnlockReleaseBuffer(parent.buffer);
}
-
+ pfree(leafDatum);
return true;
}
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4508a2b92..b54ae85f6e 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -55,8 +55,7 @@ spgistBuildCallback(Relation index, ItemPointer tid, Datum *values,
* lock on some buffer. So we need to be willing to retry. We can flush
* any temp data when retrying.
*/
- while (!spgdoinsert(index, &buildstate->spgstate, tid,
- *values, *isnull))
+ while (!spgdoinsert(index, &buildstate->spgstate, tid, values, isnull))
{
MemoryContextReset(buildstate->tmpCtx);
}
@@ -226,7 +225,7 @@ spginsert(Relation index, Datum *values, bool *isnull,
* to avoid cumulative memory consumption. That means we also have to
* redo initSpGistState(), but it's cheap enough not to matter.
*/
- while (!spgdoinsert(index, &spgstate, ht_ctid, *values, *isnull))
+ while (!spgdoinsert(index, &spgstate, ht_ctid, values, isnull))
{
MemoryContextReset(insertCtx);
initSpGistState(&spgstate, index);
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 4d506bfb9a..fbf8bd5435 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -28,7 +28,8 @@
typedef void (*storeRes_func) (SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isNull, bool recheck,
- bool recheckDistances, double *distances);
+ bool recheckDistances, double *distances,
+ SpGistLeafTuple leafTuple);
/*
* Pairing heap comparison function for the SpGistSearchItem queue.
@@ -88,6 +89,9 @@ spgFreeSearchItem(SpGistScanOpaque so, SpGistSearchItem *item)
if (item->traversalValue)
pfree(item->traversalValue);
+ if (item->isLeaf && item->leafTuple)
+ pfree(item->leafTuple);
+
pfree(item);
}
@@ -134,6 +138,8 @@ spgAddStartItem(SpGistScanOpaque so, bool isnull)
startEntry->recheck = false;
startEntry->recheckDistances = false;
+ startEntry->leafTuple = NULL;
+
spgAddSearchItemToQueue(so, startEntry);
}
@@ -438,14 +444,30 @@ spgendscan(IndexScanDesc scan)
* Leaf SpGistSearchItem constructor, called in queue context
*/
static SpGistSearchItem *
-spgNewHeapItem(SpGistScanOpaque so, int level, ItemPointer heapPtr,
+spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
Datum leafValue, bool recheck, bool recheckDistances,
bool isnull, double *distances)
{
SpGistSearchItem *item = spgAllocSearchItem(so, isnull, distances);
+ /*
+ * If there are INCLUDE attributes search item in the queue should contain
+ * them.
+ */
+ if (so->state.includeTupdesc)
+ {
+ Assert(so->state.includeTupdesc->natts);
+
+ item->leafTuple = palloc(leafTuple->size);
+ memcpy(item->leafTuple, leafTuple, leafTuple->size);
+ }
+ else
+ {
+ item->leafTuple = NULL;
+ }
+
item->level = level;
- item->heapPtr = *heapPtr;
+ item->heapPtr = leafTuple->heapPtr;
/* copy value to queue cxt out of tmp cxt */
item->value = isnull ? (Datum) 0 :
datumCopy(leafValue, so->state.attLeafType.attbyval,
@@ -503,6 +525,8 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
in.returnData = so->want_itup;
in.leafDatum = SGLTDATUM(leafTuple, &so->state);
+
+
out.leafValue = (Datum) 0;
out.recheck = false;
out.distances = NULL;
@@ -528,7 +552,7 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
/* the scan is ordered -> add the item to the queue */
MemoryContext oldCxt = MemoryContextSwitchTo(so->traversalCxt);
SpGistSearchItem *heapItem = spgNewHeapItem(so, item->level,
- &leafTuple->heapPtr,
+ leafTuple,
leafValue,
recheck,
recheckDistances,
@@ -543,8 +567,10 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
{
/* non-ordered scan, so report the item right away */
Assert(!recheckDistances);
+
storeRes(so, &leafTuple->heapPtr, leafValue, isnull,
- recheck, false, NULL);
+ recheck, false, NULL, leafTuple);
+
*reportedSome = true;
}
}
@@ -736,7 +762,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
/* dead tuple should be first in chain */
Assert(offset == ItemPointerGetOffsetNumber(&item->heapPtr));
/* No live entries on this page */
- Assert(leafTuple->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(leafTuple->nextOffset) == InvalidOffsetNumber);
return SpGistBreakOffsetNumber;
}
}
@@ -750,7 +776,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes);
- return leafTuple->nextOffset;
+ return SGLT_GET_OFFSET(leafTuple->nextOffset);
}
/*
@@ -782,8 +808,8 @@ redirect:
{
/* We store heap items in the queue only in case of ordered search */
Assert(so->numberOfNonNullOrderBys > 0);
- storeRes(so, &item->heapPtr, item->value, item->isNull,
- item->recheck, item->recheckDistances, item->distances);
+ storeRes(so, &item->heapPtr, item->value, item->isNull, item->recheck,
+ item->recheckDistances, item->distances, item->leafTuple);
reportedSome = true;
}
else
@@ -877,7 +903,7 @@ redirect:
static void
storeBitmap(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *distances)
+ double *distances, SpGistLeafTuple leafTuple)
{
Assert(!recheckDistances && !distances);
tbm_add_tuples(so->tbm, heapPtr, 1, recheck);
@@ -904,7 +930,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
static void
storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *nonNullDistances)
+ double *nonNullDistances, SpGistLeafTuple leafTuple)
{
Assert(so->nPtrs < MaxIndexTuplesPerPage);
so->heapPtrs[so->nPtrs] = *heapPtr;
@@ -949,9 +975,38 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
* Reconstruct index data. We have to copy the datum out of the temp
* context anyway, so we may as well create the tuple here.
*/
- so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
- &leafValue,
- &isnull);
+ if (so->state.includeTupdesc)
+ {
+ /* Add INCLUDE attributes */
+ Datum *leafDatums;
+ bool *leafIsnulls;
+
+ Assert(so->state.includeTupdesc->natts);
+
+ leafDatums = (Datum *) palloc(sizeof(Datum) * (so->state.includeTupdesc->natts + 1));
+ leafIsnulls = (bool *) palloc(sizeof(bool) * (so->state.includeTupdesc->natts + 1));
+
+ spgDeformLeafTuple(leafTuple, &so->state, leafDatums, leafIsnulls, isnull);
+
+ /*
+ * override key value extracted from LeafTuple in case we've
+ * reconstructed it already
+ */
+ leafDatums[0] = leafValue;
+ leafIsnulls[0] = isnull;
+
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ leafDatums,
+ leafIsnulls);
+ pfree(leafDatums);
+ pfree(leafIsnulls);
+ }
+ else
+ {
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ &leafValue,
+ &isnull);
+ }
}
so->nPtrs++;
}
@@ -1019,6 +1074,10 @@ spgcanreturn(Relation index, int attno)
{
SpGistCache *cache;
+ /* INCLUDE attributes can always be fetched for index-only scans */
+ if (attno > 1)
+ return true;
+
/* We can do it if the opclass config function says so */
cache = spgGetCache(index);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 0efe05e552..5247c5b4b0 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -31,7 +31,18 @@
#include "utils/index_selfuncs.h"
#include "utils/lsyscache.h"
#include "utils/syscache.h"
+#include "access/itup.h"
+#include "access/detoast.h"
+#include "access/toast_internals.h"
+#include "access/heaptoast.h"
+#include "utils/expandeddatum.h"
+/* Does att's datatype allow packing into the 1-byte-header varlena format? */
+#define ATT_IS_PACKABLE(att) \
+ ((att)->attlen == -1 && (att)->attstorage != TYPSTORAGE_PLAIN)
+
+Size spgIncludedDataSize(TupleDesc tupleDesc, Datum *values,
+ bool *isnull, Size start);
/*
* SP-GiST handler function: return IndexAmRoutine with access method parameters
@@ -49,7 +60,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amcanorderbyop = true;
amroutine->amcanbackward = false;
amroutine->amcanunique = false;
- amroutine->amcanmulticol = false;
+ amroutine->amcanmulticol = true;
amroutine->amoptionalkey = true;
amroutine->amsearcharray = false;
amroutine->amsearchnulls = true;
@@ -57,7 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amclusterable = false;
amroutine->ampredlocks = false;
amroutine->amcanparallel = false;
- amroutine->amcaninclude = false;
+ amroutine->amcaninclude = true;
amroutine->amusemaintenanceworkmem = false;
amroutine->amparallelvacuumoptions =
VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
@@ -116,14 +127,21 @@ spgGetCache(Relation index)
cache = MemoryContextAllocZero(index->rd_indexcxt,
sizeof(SpGistCache));
- /* SPGiST doesn't support multi-column indexes */
- Assert(index->rd_att->natts == 1);
+ /*
+ * SPGiST should have one key column and can also have INCLUDE
+ * columns
+ */
+ if (IndexRelationGetNumberOfKeyAttributes(index) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("SPGiST index can have only one key column")));
/*
- * Get the actual data type of the indexed column from the index
- * tupdesc. We pass this to the opclass config function so that
- * polymorphic opclasses are possible.
+ * Get the actual data type of the key column from the index tupdesc.
+ * We pass this to the opclass config function so that polymorphic
+ * opclasses are possible.
*/
+
atttype = TupleDescAttr(index->rd_att, 0)->atttypid;
/* Call the config function to get config info for the opclass */
@@ -156,6 +174,7 @@ spgGetCache(Relation index)
fillTypeDesc(&cache->attPrefixType, cache->config.prefixType);
fillTypeDesc(&cache->attLabelType, cache->config.labelType);
+
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
@@ -177,7 +196,23 @@ spgGetCache(Relation index)
/* assume it's up to date */
cache = (SpGistCache *) index->rd_amcache;
}
+ /* Form descriptor for INCLUDE columns if any */
+ if (IndexRelationGetNumberOfAttributes(index) > 1)
+ {
+ int i;
+
+ cache->includeTupdesc = CreateTemplateTupleDesc(
+ IndexRelationGetNumberOfAttributes(index) - 1);
+ for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; i++)
+ {
+ TupleDescInitEntry(cache->includeTupdesc, i + 1, NULL,
+ TupleDescAttr(index->rd_att, i + 1)->atttypid,
+ -1, 0);
+ }
+ }
+ else
+ cache->includeTupdesc = NULL;
return cache;
}
@@ -190,6 +225,7 @@ initSpGistState(SpGistState *state, Relation index)
/* Get cached static information about index */
cache = spgGetCache(index);
+ state->includeTupdesc = cache->includeTupdesc;
state->config = cache->config;
state->attType = cache->attType;
state->attLeafType = cache->attLeafType;
@@ -603,8 +639,8 @@ spgoptions(Datum reloptions, bool validate)
/*
* Get the space needed to store a non-null datum of the indicated type.
- * Note the result is already rounded up to a MAXALIGN boundary.
- * Also, we follow the SPGiST convention that pass-by-val types are
+ * Note the result is not maxaligned and this should be done by the caller if
+ * needed. Also, we follow the SPGiST convention that pass-by-val types are
* just stored in their Datum representation (compare memcpyDatum).
*/
unsigned int
@@ -619,7 +655,7 @@ SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum)
else
size = VARSIZE_ANY(datum);
- return MAXALIGN(size);
+ return size;
}
/*
@@ -642,36 +678,205 @@ memcpyDatum(void *target, SpGistTypeDesc *att, Datum datum)
}
/*
- * Construct a leaf tuple containing the given heap TID and datum value
+ * Private version of heap_compute_data_size with start address not
+ * at MAXALIGN boundary. The reason is that start address (and alignment)
+ * influence alignment of each of next values and overall size of INCLUDE
+ * data area in SpGiST leaf tuple. MAXALINGing first INCLUDE attribute is
+ * avoided for not to introduce unnecessary gap before it.
+ */
+Size
+spgIncludedDataSize(TupleDesc tupleDesc,
+ Datum *values,
+ bool *isnull, Size start)
+{
+ Size data_length = 0;
+ int i;
+ int numberOfAttributes = tupleDesc->natts;
+
+ data_length = start;
+ for (i = 0; i < numberOfAttributes; i++)
+ {
+ Datum val;
+ Form_pg_attribute atti;
+
+ if (isnull[i])
+ continue;
+
+ val = values[i];
+ atti = TupleDescAttr(tupleDesc, i);
+
+ if (ATT_IS_PACKABLE(atti) &&
+ VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
+ {
+ /*
+ * we're anticipating converting to a short varlena header, so
+ * adjust length and don't count any alignment
+ */
+ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
+ }
+ else if (atti->attlen == -1 &&
+ VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+ {
+ /*
+ * we want to flatten the expanded value so that the constructed
+ * tuple doesn't depend on it
+ */
+ data_length = att_align_nominal(data_length, atti->attalign);
+ data_length += EOH_get_flat_size(DatumGetEOHP(val));
+ }
+ else
+ {
+ data_length = att_align_datum(data_length, atti->attalign,
+ atti->attlen, val);
+ data_length = att_addlength_datum(data_length, atti->attlen,
+ val);
+ }
+ }
+ return data_length - start;
+}
+
+/* Calculate overall leaf tuple size. SGLTHDRSZ is MAXALIGNed for backward
+ * compatibility and there might be a gap between header and key data. After
+ * key data there are no such gaps more than is is necessary for each value
+ * alignment. Overall result is MAXALIGNed which is anyway unavoidable
+ * when placing a tuple on a page.
+ */
+unsigned int
+spgLeafTupleSize(SpGistState *state, Datum *datum, bool *isnull)
+{
+ /* compute space needed, nullmask size and offset for INCLUDE attributes */
+ unsigned int size = SGLTHDRSZ;
+ unsigned int i;
+
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+ /* nullmask size */
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ size += (state->includeTupdesc->natts / 8) + 1;
+ break;
+ }
+ }
+ /* overall INCLUDE attributes size each with added proper alignment. */
+ size += spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ }
+ return MAXALIGN(size);
+}
+
+/*
+ * Construct a leaf tuple containing the given heap TID, key data and INCLUDE
+ * columns data. Key data starts from MAXALIGN boundary for backward compatibility.
+ * Nullmask apply only to INCLUDE attributes and is placed just after key data if
+ * there is at least one NULL among INCLUDE attributes. It doesn't need alignment.
+ * Then all INCLUDE columns data follow aligned by their typealign-s.
*/
SpGistLeafTuple
spgFormLeafTuple(SpGistState *state, ItemPointer heapPtr,
- Datum datum, bool isnull)
+ Datum *datum, bool *isnull)
{
SpGistLeafTuple tup;
- unsigned int size;
+ unsigned int size = SGLTHDRSZ;
+ unsigned int include_offset = 0;
+ unsigned int nullmask_size = 0;
+ unsigned int data_offset = 0;
+ unsigned int data_size = 0;
+ uint16 tupmask = 0;
+ int i;
- /* compute space needed (note result is already maxaligned) */
- size = SGLTHDRSZ;
- if (!isnull)
- size += SpGistGetTypeSize(&state->attLeafType, datum);
+ /*
+ * Calculate space needed. If there are INCLUDE attributes also calculate
+ * sizes and offsets needed for heap_fill_tuple
+ */
+ if (!isnull[0])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = size;
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ nullmask_size = (state->includeTupdesc->natts / 8) + 1;
+ size += nullmask_size;
+ break;
+ }
+ }
+
+ /*
+ * Alignment of all INCLUDE attributes is counted inside data_size.
+ * data_offset itself is not aligned.
+ */
+ data_size = spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ data_offset = size;
+
+ size += data_size;
+ }
/*
- * Ensure that we can replace the tuple with a dead tuple later. This
- * test is unnecessary when !isnull, but let's be safe.
+ * Ensure that we can replace the tuple with a dead tuple later. This
+ * test is unnecessary when !isnull[0], but let's be safe.
*/
if (size < SGDTSIZE)
size = SGDTSIZE;
/* OK, form the tuple */
- tup = (SpGistLeafTuple) palloc0(size);
+ tup = (SpGistLeafTuple) palloc0(MAXALIGN(size));
- tup->size = size;
- tup->nextOffset = InvalidOffsetNumber;
+ tup->size = MAXALIGN(size);
+ SGLT_SET_OFFSET(tup->nextOffset, InvalidOffsetNumber);
tup->heapPtr = *heapPtr;
- if (!isnull)
- memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum);
+ if (!isnull[0])
+ memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum[0]);
+
+ /* Add INCLUDE columns data to leaf tuple if any. */
+ if (state->includeTupdesc)
+ {
+ /*
+ * The start of INCLUDE attributes tuple (include_offset) is next
+ * byte after end of a key value and is not required to be aligned.
+ * Nullmask is included without alignment and values alignment are
+ * done by heap_fill_tuple() automatically.
+ */
+ heap_fill_tuple(state->includeTupdesc, datum + 1, isnull + 1,
+ (char *) tup + data_offset,
+ data_size, &tupmask,
+ (nullmask_size ? (bits8 *) tup + include_offset : NULL));
+
+ if (nullmask_size)
+ SGLT_SET_CONTAINSNULLMASK(tup->nextOffset, 1);
+
+ /*
+ * We do this because heap_fill_tuple wants to initialize a "tupmask"
+ * which is used for HeapTuples, but the only relevant info is the
+ * "has variable attributes" field. We have already set the hasnull
+ * bit above.
+ */
+ if (tupmask & HEAP_HASVARWIDTH)
+ SGLT_SET_CONTAINSVARATT(tup->nextOffset, 1);
+ }
return tup;
}
@@ -688,10 +893,10 @@ spgFormNodeTuple(SpGistState *state, Datum label, bool isnull)
unsigned int size;
unsigned short infomask = 0;
- /* compute space needed (note result is already maxaligned) */
+ /* compute space needed */
size = SGNTHDRSZ;
if (!isnull)
- size += SpGistGetTypeSize(&state->attLabelType, label);
+ size += MAXALIGN(SpGistGetTypeSize(&state->attLabelType, label));
/*
* Here we make sure that the size will fit in the field reserved for it
@@ -735,7 +940,7 @@ spgFormInnerTuple(SpGistState *state, bool hasPrefix, Datum prefix,
/* Compute size needed */
if (hasPrefix)
- prefixSize = SpGistGetTypeSize(&state->attPrefixType, prefix);
+ prefixSize = MAXALIGN(SpGistGetTypeSize(&state->attPrefixType, prefix));
else
prefixSize = 0;
@@ -1046,3 +1251,133 @@ spgproperty(Oid index_oid, int attno,
return true;
}
+
+/*
+ * Convert an SpGist tuple into palloc'd Datum/isnull arrays.
+ *
+ */
+void
+spgDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state, Datum *datum, bool *isnull,
+ bool key_isnull)
+{
+ unsigned int include_offset; /* offset of INCLUDE data */
+ int off;
+ bits8 *nullmask_ptr = NULL; /* ptr to null bitmap in tuple */
+ char *tp;
+ bool slow = false; /* can we use/set attcacheoff? */
+ int i;
+
+ if (key_isnull)
+ {
+ datum[0] = (Datum) 0;
+ isnull[0] = true;
+ }
+ else
+ {
+ datum[0] = SGLTDATUM(tup, state);
+ isnull[0] = false;
+ }
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = key_isnull ? SGLTHDRSZ : SGLTHDRSZ + SpGistGetTypeSize(&state->attLeafType, datum[0]);
+
+ tp = (char *) tup;
+ off = include_offset;
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ nullmask_ptr = (bits8 *) tp + include_offset;
+ off += (state->includeTupdesc->natts) / 8 + 1;
+ }
+
+ if (state->attLeafType.attlen > 0 && !SGLT_GET_CONTAINSVARATT(tup->nextOffset) &&
+ !SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ /* can use attcacheoff for all attributes */
+ {
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ isnull[i] = false;
+ if (thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else
+ {
+ off = att_align_nominal(off, thisatt->attalign);
+ thisatt->attcacheoff = off;
+ }
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+ }
+ }
+ else
+
+ /*
+ * general case: can use cache until first null or varlen
+ * attribute
+ */
+ {
+ if (state->attLeafType.attlen <= 0)
+ slow = true; /* can't use attcacheoff at all */
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup->nextOffset))
+ {
+ if (att_isnull(i - 1, nullmask_ptr))
+ {
+ datum[i] = (Datum) 0;
+ isnull[i] = true;
+ slow = true; /* can't use attcacheoff anymore */
+ continue;
+ }
+ }
+
+ isnull[i] = false;
+
+ if (!slow && thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else if (thisatt->attlen == -1)
+ {
+ /*
+ * We can only cache the offset for a varlena attribute if
+ * the offset is already suitably aligned, so that there
+ * would be no pad bytes in any case: then the offset will
+ * be valid for either an aligned or unaligned value.
+ */
+ if (!slow && off == att_align_nominal(off, thisatt->attalign))
+ thisatt->attcacheoff = off;
+ else
+ {
+ off = att_align_pointer(off, thisatt->attalign, -1, tp + off);
+ slow = true;
+ }
+ }
+ else
+ {
+ /* not varlena, so safe to use att_align_nominal */
+ off = att_align_nominal(off, thisatt->attalign);
+
+ if (!slow)
+ thisatt->attcacheoff = off;
+ }
+
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+
+ if (thisatt->attlen <= 0)
+ slow = true; /* can't use attcacheoff anymore */
+ }
+ }
+ }
+}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index bd98707f3c..badda5f9e0 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -168,23 +168,28 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
/* Form predecessor map, too */
- if (lt->nextOffset != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) != InvalidOffsetNumber)
{
/* paranoia about corrupted chain links */
- if (lt->nextOffset < FirstOffsetNumber ||
- lt->nextOffset > max ||
- predecessor[lt->nextOffset] != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt->nextOffset) < FirstOffsetNumber ||
+ SGLT_GET_OFFSET(lt->nextOffset) > max ||
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] != InvalidOffsetNumber)
elog(ERROR, "inconsistent tuple chain links in page %u of index \"%s\"",
BufferGetBlockNumber(buffer),
RelationGetRelationName(index));
- predecessor[lt->nextOffset] = i;
+ predecessor[SGLT_GET_OFFSET(lt->nextOffset)] = i;
}
}
else if (lt->tupstate == SPGIST_REDIRECT)
{
SpGistDeadTuple dt = (SpGistDeadTuple) lt;
- Assert(dt->nextOffset == InvalidOffsetNumber);
+ /*
+ * Dead tuple nextOffset is allowed to have any values of two
+ * highest bits in case it is inherited from SpGistLeafTuple where
+ * these bits have their own meaning.
+ */
+ Assert(SGLT_GET_OFFSET(dt->nextOffset) == InvalidOffsetNumber);
Assert(ItemPointerIsValid(&dt->pointer));
/*
@@ -201,7 +206,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
else
{
- Assert(lt->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(lt->nextOffset) == InvalidOffsetNumber);
}
}
@@ -250,7 +255,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
prevLive = deletable[i] ? InvalidOffsetNumber : i;
/* scan down the chain ... */
- j = head->nextOffset;
+ j = SGLT_GET_OFFSET(head->nextOffset);
while (j != InvalidOffsetNumber)
{
SpGistLeafTuple lt;
@@ -301,7 +306,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
interveningDeletable = false;
}
- j = lt->nextOffset;
+ j = SGLT_GET_OFFSET(lt->nextOffset);
}
if (prevLive == InvalidOffsetNumber)
@@ -366,7 +371,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 7be2291d07..4022e3af07 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -122,8 +122,8 @@ spgRedoAddLeaf(XLogReaderState *record)
head = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, xldata->offnumHeadLeaf));
- Assert(head->nextOffset == leafTupleHdr.nextOffset);
- head->nextOffset = xldata->offnumLeaf;
+ Assert(SGLT_GET_OFFSET(head->nextOffset) == SGLT_GET_OFFSET(leafTupleHdr.nextOffset));
+ SGLT_SET_OFFSET(head->nextOffset, xldata->offnumLeaf);
}
}
else
@@ -822,7 +822,7 @@ spgRedoVacuumLeaf(XLogReaderState *record)
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt->nextOffset, chainDest[i]);
}
PageSetLSN(page, lsn);
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index 00b98ec6a0..75e09f6664 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -22,7 +22,6 @@
#include "utils/geo_decls.h"
#include "utils/relcache.h"
-
typedef struct SpGistOptions
{
int32 varlena_header_; /* varlena header (do not touch directly!) */
@@ -141,6 +140,7 @@ typedef struct SpGistState
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc; /* tuple descriptor of INCLUDE columns */
char *deadTupleStorage; /* workspace for spgFormDeadTuple */
@@ -148,104 +148,6 @@ typedef struct SpGistState
bool isBuild; /* true if doing index build */
} SpGistState;
-typedef struct SpGistSearchItem
-{
- pairingheap_node phNode; /* pairing heap node */
- Datum value; /* value reconstructed from parent or
- * leafValue if heaptuple */
- void *traversalValue; /* opclass-specific traverse value */
- int level; /* level of items on this page */
- ItemPointerData heapPtr; /* heap info, if heap tuple */
- bool isNull; /* SearchItem is NULL item */
- bool isLeaf; /* SearchItem is heap item */
- bool recheck; /* qual recheck is needed */
- bool recheckDistances; /* distance recheck is needed */
-
- /* array with numberOfOrderBys entries */
- double distances[FLEXIBLE_ARRAY_MEMBER];
-} SpGistSearchItem;
-
-#define SizeOfSpGistSearchItem(n_distances) \
- (offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
-
-/*
- * Private state of an index scan
- */
-typedef struct SpGistScanOpaqueData
-{
- SpGistState state; /* see above */
- pairingheap *scanQueue; /* queue of to be visited items */
- MemoryContext tempCxt; /* short-lived memory context */
- MemoryContext traversalCxt; /* single scan lifetime memory context */
-
- /* Control flags showing whether to search nulls and/or non-nulls */
- bool searchNulls; /* scan matches (all) null entries */
- bool searchNonNulls; /* scan matches (some) non-null entries */
-
- /* Index quals to be passed to opclass (null-related quals removed) */
- int numberOfKeys; /* number of index qualifier conditions */
- ScanKey keyData; /* array of index qualifier descriptors */
- int numberOfOrderBys; /* number of ordering operators */
- int numberOfNonNullOrderBys; /* number of ordering operators
- * with non-NULL arguments */
- ScanKey orderByData; /* array of ordering op descriptors */
- Oid *orderByTypes; /* array of ordering op return types */
- int *nonNullOrderByOffsets; /* array of offset of non-NULL
- * ordering keys in the original array */
- Oid indexCollation; /* collation of index column */
-
- /* Opclass defined functions: */
- FmgrInfo innerConsistentFn;
- FmgrInfo leafConsistentFn;
-
- /* Pre-allocated workspace arrays: */
- double *zeroDistances;
- double *infDistances;
-
- /* These fields are only used in amgetbitmap scans: */
- TIDBitmap *tbm; /* bitmap being filled */
- int64 ntids; /* number of TIDs passed to bitmap */
-
- /* These fields are only used in amgettuple scans: */
- bool want_itup; /* are we reconstructing tuples? */
- TupleDesc indexTupDesc; /* if so, tuple descriptor for them */
- int nPtrs; /* number of TIDs found on current page */
- int iPtr; /* index for scanning through same */
- ItemPointerData heapPtrs[MaxIndexTuplesPerPage]; /* TIDs from cur page */
- bool recheck[MaxIndexTuplesPerPage]; /* their recheck flags */
- bool recheckDistances[MaxIndexTuplesPerPage]; /* distance recheck
- * flags */
- HeapTuple reconTups[MaxIndexTuplesPerPage]; /* reconstructed tuples */
-
- /* distances (for recheck) */
- IndexOrderByDistance *distances[MaxIndexTuplesPerPage];
-
- /*
- * Note: using MaxIndexTuplesPerPage above is a bit hokey since
- * SpGistLeafTuples aren't exactly IndexTuples; however, they are larger,
- * so this is safe.
- */
-} SpGistScanOpaqueData;
-
-typedef SpGistScanOpaqueData *SpGistScanOpaque;
-
-/*
- * This struct is what we actually keep in index->rd_amcache. It includes
- * static configuration information as well as the lastUsedPages cache.
- */
-typedef struct SpGistCache
-{
- spgConfigOut config; /* filled in by opclass config method */
-
- SpGistTypeDesc attType; /* type of values to be indexed/restored */
- SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
- SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
- SpGistTypeDesc attLabelType; /* type of node label values */
-
- SpGistLUPCache lastUsedPages; /* local storage of last-used info */
-} SpGistCache;
-
-
/*
* SPGiST tuple types. Note: inner, leaf, and dead tuple structs
* must have the same tupstate field in the same position! Real inner and
@@ -305,8 +207,8 @@ typedef SpGistInnerTupleData *SpGistInnerTuple;
* SPGiST node tuple: one node within an inner tuple
*
* Node tuples use the same header as ordinary Postgres IndexTuples, but
- * we do not use a null bitmap, because we know there is only one column
- * so the INDEX_NULL_MASK bit suffices. Also, pass-by-value datums are
+ * we do not use a null bitmap, because we know there is only one key column
+ * so the INDEX_NULL_MASK bit suffices. Also, pass-by-value datums are
* stored as a full Datum, the same convention as for inner tuple prefixes
* and leaf tuple datums.
*/
@@ -322,11 +224,13 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
PointerGetDatum(SGNTDATAPTR(x)))
/*
- * SPGiST leaf tuple: carries a datum and a heap tuple TID
+ * SPGiST leaf tuple: carries a key datum, a heap tuple TID and optional
+ * datums and nullmask of INCLUDE columns.
*
- * In the simplest case, the datum is the same as the indexed value; but
+ * In the simplest case, the key datum is the same as the indexed value; but
* it could also be a suffix or some other sort of delta that permits
* reconstruction given knowledge of the prefix path traversed to get here.
+ * Datums of INCLUDE columns are stored without modification.
*
* The size field is wider than could possibly be needed for an on-disk leaf
* tuple, but this allows us to form leaf tuples even when the datum is too
@@ -346,14 +250,43 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
* however, the SGDTSIZE limit ensures that's there's a Datum word there
* anyway, so SGLTDATUM can be applied safely as long as you don't do
* anything with the result.
+ *
+ * Minimum space to store SpGistLeafTuple plus ItemIdData on a page is 16 bytes,
+ * so 14 lower bits of nextOffset is enough to store tuple number in a chain
+ * on a page even if a page size is 64Kb. Two higher bits are to store per-tuple
+ * information for INCLUDE attributes: is there nulls mask exist, and are there
+ * any INCLUDE attributes of variable length type. If there are no INCLUDE
+ * columns these higher bits are not used.
+ *
+ * If there are INCLUDE columns, they are stored after a key value, each
+ * starting from its own typalign boundary. Unlike IndexTuple, first INCLUDE
+ * value does not need to start from MAXALIGN boundary, so SPGiST uses private
+ * routines to access them. Nullmask with size (number of INCLUDE columns)/8
+ * bytes is put without alignment between the key and the first INCLUDE column.
+ * If there is an alignment gap between them, nullmask has a good chance to fit
+ * into the gap, thus making its storage free of charge.
*/
+
typedef struct SpGistLeafTupleData
{
unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
size:30; /* large enough for any palloc'able value */
- OffsetNumber nextOffset; /* next tuple in chain, or InvalidOffsetNumber */
+
+ /* ---------------
+ * nextOffset is laid out in the following fashion:
+ *
+ * 15th (high) bit: INCLUDE values has nulls
+ * 14th bit: INCLUDE values has var-length attributes
+ * 13-0 bit: number of next tuple in chain on a page, or InvalidOffsetNumber
+ * ---------------
+ */
+
+ unsigned short nextOffset; /* info for linking tuples in a chain on a leaf page,
+ and additional info for INCLUDE attributes */
ItemPointerData heapPtr; /* TID of represented heap tuple */
- /* leaf datum follows */
+ /* key column data follows */
+ /* nullmask of INCLUDE values follows if there are nulls in INCLUDE attributes*/
+ /* INCLUDE columns data follow if any */
} SpGistLeafTupleData;
typedef SpGistLeafTupleData *SpGistLeafTuple;
@@ -361,8 +294,17 @@ typedef SpGistLeafTupleData *SpGistLeafTuple;
#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
- *(Datum *) SGLTDATAPTR(x) : \
- PointerGetDatum(SGLTDATAPTR(x)))
+ *(Datum *) SGLTDATAPTR(x) : \
+ PointerGetDatum(SGLTDATAPTR(x)))
+/*
+ * Macros to access bit fields inside nextOffset independently.
+ */
+#define SGLT_GET_OFFSET(x) ( (x) & 0x3FFF )
+#define SGLT_GET_CONTAINSNULLMASK(x) ( (x) >> 15 )
+#define SGLT_GET_CONTAINSVARATT(x) ( ( (x) & 0x4000 ) >> 14 )
+#define SGLT_SET_OFFSET(x,o) ( (x) = ( (x) & 0xC000 ) | ( (o) & 0x3FFF) )
+#define SGLT_SET_CONTAINSNULLMASK(x,n) ( (x) = ( (n) << 15 ) | ( (x) & 0x3FFF ) )
+#define SGLT_SET_CONTAINSVARATT(x,v) ( (x) = ( (v) << 14 ) | ( (x) & 0xBFFF ) )
/*
* SPGiST dead tuple: declaration for examining non-live tuples
@@ -394,7 +336,6 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
* size plus sizeof(ItemIdData) (for the line pointer). This works correctly
* so long as tuple sizes are always maxaligned.
*/
-
/* Page capacity after allowing for fixed header and special space */
#define SPGIST_PAGE_CAPACITY \
MAXALIGN_DOWN(BLCKSZ - \
@@ -410,6 +351,105 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
Min(SpGistPageGetOpaque(p)->nPlaceholder, n) * \
(SGDTSIZE + sizeof(ItemIdData)))
+
+typedef struct SpGistSearchItem
+{
+ pairingheap_node phNode; /* pairing heap node */
+ Datum value; /* value reconstructed from parent or
+ * leafValue if heaptuple */
+ void *traversalValue; /* opclass-specific traverse value */
+ int level; /* level of items on this page */
+ ItemPointerData heapPtr; /* heap info, if heap tuple */
+ bool isNull; /* SearchItem is NULL item */
+ bool isLeaf; /* SearchItem is heap item */
+ bool recheck; /* qual recheck is needed */
+ bool recheckDistances; /* distance recheck is needed */
+ SpGistLeafTuple leafTuple;
+ /* array with numberOfOrderBys entries */
+ double distances[FLEXIBLE_ARRAY_MEMBER];
+} SpGistSearchItem;
+
+#define SizeOfSpGistSearchItem(n_distances) \
+ (offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
+
+/*
+ * Private state of an index scan
+ */
+typedef struct SpGistScanOpaqueData
+{
+ SpGistState state; /* see above */
+ pairingheap *scanQueue; /* queue of to be visited items */
+ MemoryContext tempCxt; /* short-lived memory context */
+ MemoryContext traversalCxt; /* single scan lifetime memory context */
+
+ /* Control flags showing whether to search nulls and/or non-nulls */
+ bool searchNulls; /* scan matches (all) null entries */
+ bool searchNonNulls; /* scan matches (some) non-null entries */
+
+ /* Index quals to be passed to opclass (null-related quals removed) */
+ int numberOfKeys; /* number of index qualifier conditions */
+ ScanKey keyData; /* array of index qualifier descriptors */
+ int numberOfOrderBys; /* number of ordering operators */
+ int numberOfNonNullOrderBys; /* number of ordering operators
+ * with non-NULL arguments */
+ ScanKey orderByData; /* array of ordering op descriptors */
+ Oid *orderByTypes; /* array of ordering op return types */
+ int *nonNullOrderByOffsets; /* array of offset of non-NULL
+ * ordering keys in the original array */
+ Oid indexCollation; /* collation of index column */
+
+ /* Opclass defined functions: */
+ FmgrInfo innerConsistentFn;
+ FmgrInfo leafConsistentFn;
+
+ /* Pre-allocated workspace arrays: */
+ double *zeroDistances;
+ double *infDistances;
+
+ /* These fields are only used in amgetbitmap scans: */
+ TIDBitmap *tbm; /* bitmap being filled */
+ int64 ntids; /* number of TIDs passed to bitmap */
+
+ /* These fields are only used in amgettuple scans: */
+ bool want_itup; /* are we reconstructing tuples? */
+ TupleDesc indexTupDesc; /* if so, tuple descriptor for them */
+ int nPtrs; /* number of TIDs found on current page */
+ int iPtr; /* index for scanning through same */
+ ItemPointerData heapPtrs[MaxIndexTuplesPerPage]; /* TIDs from cur page */
+ bool recheck[MaxIndexTuplesPerPage]; /* their recheck flags */
+ bool recheckDistances[MaxIndexTuplesPerPage]; /* distance recheck
+ * flags */
+ HeapTuple reconTups[MaxIndexTuplesPerPage]; /* reconstructed tuples */
+
+ /* distances (for recheck) */
+ IndexOrderByDistance *distances[MaxIndexTuplesPerPage];
+
+ /*
+ * Note: using MaxIndexTuplesPerPage above is a bit hokey since
+ * SpGistLeafTuples aren't exactly IndexTuples; however, they are larger,
+ * so this is safe.
+ */
+} SpGistScanOpaqueData;
+
+typedef SpGistScanOpaqueData *SpGistScanOpaque;
+
+/*
+ * This struct is what we actually keep in index->rd_amcache. It includes
+ * static configuration information as well as the lastUsedPages cache.
+ */
+typedef struct SpGistCache
+{
+ spgConfigOut config; /* filled in by opclass config method */
+
+ SpGistTypeDesc attType; /* type of values to be indexed/restored */
+ SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
+ SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
+ SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc;
+
+ SpGistLUPCache lastUsedPages; /* local storage of last-used info */
+} SpGistCache;
+
/*
* XLOG stuff
*/
@@ -456,9 +496,10 @@ extern void SpGistInitPage(Page page, uint16 f);
extern void SpGistInitBuffer(Buffer b, uint16 f);
extern void SpGistInitMetapage(Page page);
extern unsigned int SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum);
+extern unsigned int spgLeafTupleSize(SpGistState *state, Datum *datum, bool *isnull);
extern SpGistLeafTuple spgFormLeafTuple(SpGistState *state,
ItemPointer heapPtr,
- Datum datum, bool isnull);
+ Datum *datum, bool *isnull);
extern SpGistNodeTuple spgFormNodeTuple(SpGistState *state,
Datum label, bool isnull);
extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
@@ -466,6 +507,8 @@ extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
int nNodes, SpGistNodeTuple *nodes);
extern SpGistDeadTuple spgFormDeadTuple(SpGistState *state, int tupstate,
BlockNumber blkno, OffsetNumber offnum);
+extern void spgDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state,
+ Datum *datum, bool *isnull, bool key_value_isnull);
extern Datum *spgExtractNodeLabels(SpGistState *state,
SpGistInnerTuple innerTuple);
extern OffsetNumber SpGistPageAddNewItem(SpGistState *state, Page page,
@@ -484,7 +527,7 @@ extern void spgPageIndexMultiDelete(SpGistState *state, Page page,
int firststate, int reststate,
BlockNumber blkno, OffsetNumber offnum);
extern bool spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull);
+ ItemPointer heapPtr, Datum *datum, bool *isnull);
/* spgproc.c */
extern double *spg_key_orderbys_distances(Datum key, bool isLeaf,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index d92a6d12c6..93e6a43b6d 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -169,9 +169,9 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
hash | bogus |
spgist | can_order | f
spgist | can_unique | f
- spgist | can_multi_col | f
+ spgist | can_multi_col | t
spgist | can_exclude | t
- spgist | can_include | f
+ spgist | can_include | t
spgist | bogus |
(36 rows)
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..86510687c7 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -349,14 +349,13 @@ SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl' ORDER BY indexname;
DROP TABLE tbl;
/*
- * 7. Check various AMs. All but btree and gist must fail.
+ * 7. Check various AMs. All but btree, gist and spgist must fail.
*/
CREATE TABLE tbl (c1 int,c2 int, c3 box, c4 box);
CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "brin" does not support included columns
CREATE INDEX on tbl USING gist(c3) INCLUDE (c1, c4);
CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
-ERROR: access method "spgist" does not support included columns
CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "gin" does not support included columns
CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
diff --git a/src/test/regress/expected/index_including_spgist.out b/src/test/regress/expected/index_including_spgist.out
new file mode 100644
index 0000000000..213cce5c7c
--- /dev/null
+++ b/src/test/regress/expected/index_including_spgist.out
@@ -0,0 +1,143 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+SET enable_seqscan TO off;
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+VACUUM ANALYZE tbl_spgist;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+VACUUM ANALYZE tbl_spgist;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+DROP TABLE tbl_spgist;
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+----------
+(0 rows)
+
+DROP TABLE tbl_spgist;
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+ Table "public.tbl_spgist"
+ Column | Type | Collation | Nullable | Default
+--------+---------+-----------+----------+---------
+ c1 | bigint | | |
+ c2 | integer | | |
+ c3 | bigint | | |
+ c4 | box | | |
+Indexes:
+ "tbl_spgist_idx" spgist (c4) INCLUDE (c1, c3)
+
+RESET enable_seqscan;
+DROP TABLE tbl_spgist;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..985458a1a8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -50,7 +50,7 @@ test: copy copyselect copydml insert insert_conflict
# ----------
test: create_misc create_operator create_procedure
# These depend on create_misc and create_operator
-test: create_index create_index_spgist create_view index_including index_including_gist
+test: create_index create_index_spgist create_view index_including index_including_gist index_including_spgist
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..f3df961535 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -68,6 +68,7 @@ test: create_index_spgist
test: create_view
test: index_including
test: index_including_gist
+test: index_including_spgist
test: create_aggregate
test: create_function_3
test: create_cast
diff --git a/src/test/regress/sql/index_including.sql b/src/test/regress/sql/index_including.sql
index 7e517483ad..44b340053b 100644
--- a/src/test/regress/sql/index_including.sql
+++ b/src/test/regress/sql/index_including.sql
@@ -182,7 +182,7 @@ SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl' ORDER BY indexname;
DROP TABLE tbl;
/*
- * 7. Check various AMs. All but btree and gist must fail.
+ * 7. Check various AMs. All but btree, gist and spgist must fail.
*/
CREATE TABLE tbl (c1 int,c2 int, c3 box, c4 box);
CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
diff --git a/src/test/regress/sql/index_including_spgist.sql b/src/test/regress/sql/index_including_spgist.sql
new file mode 100644
index 0000000000..38ace74d4e
--- /dev/null
+++ b/src/test/regress/sql/index_including_spgist.sql
@@ -0,0 +1,84 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+SET enable_seqscan TO off;
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+VACUUM ANALYZE tbl_spgist;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+VACUUM ANALYZE tbl_spgist;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+RESET enable_seqscan;
+DROP TABLE tbl_spgist;
--
2.28.0
v1-0002-Add-VACUUM-ANALYZE-to-index-including-test.patchapplication/octet-stream; name=v1-0002-Add-VACUUM-ANALYZE-to-index-including-test.patchDownload
From eb0ed1054b766bd110b0d1675a93065c0185a60a Mon Sep 17 00:00:00 2001
From: Pavel Borisov <pashkin.elfe@gmail.com>
Date: Thu, 27 Aug 2020 19:55:37 +0400
Subject: [PATCH v1] Add VACUUM ANALYZE to index including test
---
src/test/regress/expected/index_including.out | 1 +
src/test/regress/sql/index_including.sql | 1 +
2 files changed, 2 insertions(+)
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..6a2a13ffa2 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -146,6 +146,7 @@ select * from tbl where (c1,c2,c3) < (2,5,1);
-- row comparison that compares high key at page boundary
SET enable_seqscan = off;
+VACUUM ANALYZE tbl;
explain (costs off)
select * from tbl where (c1,c2,c3) < (262,1,1) limit 1;
QUERY PLAN
diff --git a/src/test/regress/sql/index_including.sql b/src/test/regress/sql/index_including.sql
index 7e517483ad..1f300fe3b6 100644
--- a/src/test/regress/sql/index_including.sql
+++ b/src/test/regress/sql/index_including.sql
@@ -78,6 +78,7 @@ select * from tbl where (c1,c2,c3) < (2,5,1);
select * from tbl where (c1,c2,c3) < (2,5,1);
-- row comparison that compares high key at page boundary
SET enable_seqscan = off;
+VACUUM ANALYZE tbl;
explain (costs off)
select * from tbl where (c1,c2,c3) < (262,1,1) limit 1;
select * from tbl where (c1,c2,c3) < (262,1,1) limit 1;
--
2.28.0
27 авг. 2020 г., в 21:03, Pavel Borisov <pashkin.elfe@gmail.com> написал(а):
see v8
For me is the only concerning point is putting nullmask and varatt bits into tuple->nextOffset.
But, probably, we can go with this.
But let's change macro a bit. When I see
SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
I expect that leafTuple->nextOffset is function argument by value and will not be changed.
For example see ItemPointerSetOffsetNumber() - it's not exposing ip_posid.
Also, I'd propose instead of
*(leafChainDatums + i * natts) and leafChainIsnulls + i * natts
using something like
int some_index = i * natts;
leafChainDatumsp[some_index] and &leafChainIsnulls[some_index]
But, probably, it's a matter of taste...
Also I'm not sure would it be helpful to use instead of
isnull[0] and leafDatum[0]
more complex
#define SpgKeyIndex 0
isnull[SpgKeyIndex] and leafDatum[SpgKeyIndex]
There is so many [0] in the patch...
Thanks!
Best regards, Andrey Borodin.
But let's change macro a bit. When I see
SGLT_SET_OFFSET(leafTuple->nextOffset, InvalidOffsetNumber);
I expect that leafTuple->nextOffset is function argument by value and will
not be changed.
For example see ItemPointerSetOffsetNumber() - it's not exposing ip_posid.Also, I'd propose instead of
*(leafChainDatums + i * natts) and leafChainIsnulls + i * natts
using something like
int some_index = i * natts;
leafChainDatumsp[some_index] and &leafChainIsnulls[some_index]But, probably, it's a matter of taste...
Also I'm not sure would it be helpful to use instead of
isnull[0] and leafDatum[0]
more complex
#define SpgKeyIndex 0
isnull[SpgKeyIndex] and leafDatum[SpgKeyIndex]There is so many [0] in the patch...
I agree with all of your proposals and integrated them into v9.
Thank you very much!
--
Best regards,
Pavel Borisov
Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>
Attachments:
v9-0001-Covering-SP-GiST-index-support-for-INCLUDE-column.patchapplication/octet-stream; name=v9-0001-Covering-SP-GiST-index-support-for-INCLUDE-column.patchDownload
From 717f09b9a7b94af1111a6812a13c124f1a600bf4 Mon Sep 17 00:00:00 2001
From: Pavel Borisov <pashkin.elfe@gmail.com>
Date: Mon, 31 Aug 2020 15:38:11 +0400
Subject: [PATCH v9] Covering SP-GiST index - support for INCLUDE columns
Adding INCLUDE columns for SPGiST index is intended to increase the speed of queries by making scans index-only likewise
in btree and GiST index. These columns are added only to leaf tuples and they are not used in index tree search but they
can be fetched during index scan.
The other point of INCLUDE columns is to overcome SP-GiST limitation of being single-column in principle. I.e. in certain
cases a single covering SP-GiST index can replace several separate ones with less disk space and shared buffers
consumption, faster, update etc. Also, any data types without SP-GiST supported opclasses can be included.
Discussion: https://www.postgresql.org/message-id/flat/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com
---
doc/src/sgml/indices.sgml | 4 +-
doc/src/sgml/ref/create_index.sgml | 4 +-
doc/src/sgml/spgist.sgml | 8 +
src/backend/access/spgist/README | 21 +-
src/backend/access/spgist/spgdoinsert.c | 175 +++++---
src/backend/access/spgist/spginsert.c | 5 +-
src/backend/access/spgist/spgscan.c | 87 +++-
src/backend/access/spgist/spgutils.c | 389 ++++++++++++++++--
src/backend/access/spgist/spgvacuum.c | 25 +-
src/backend/access/spgist/spgxlog.c | 6 +-
src/include/access/spgist_private.h | 273 +++++++-----
src/test/regress/expected/amutils.out | 4 +-
src/test/regress/expected/index_including.out | 3 +-
.../expected/index_including_spgist.out | 143 +++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/index_including.sql | 2 +-
.../regress/sql/index_including_spgist.sql | 84 ++++
18 files changed, 997 insertions(+), 239 deletions(-)
create mode 100644 src/test/regress/expected/index_including_spgist.out
create mode 100644 src/test/regress/sql/index_including_spgist.sql
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 28adaba72d..c89cc6cb08 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1194,8 +1194,8 @@ CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);
likely to not need to access the heap. If the heap tuple must be visited
anyway, it costs nothing more to get the column's value from there.
Other restrictions are that expressions are not currently supported as
- included columns, and that only B-tree and GiST indexes currently support
- included columns.
+ included columns, and that only B-tree, GiST and SP-GiST indexes currently
+ support included columns.
</para>
<para>
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index ff87b2d28f..3d360bcf47 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -187,8 +187,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
</para>
<para>
- Currently, the B-tree and the GiST index access methods support this
- feature. In B-tree and the GiST indexes, the values of columns listed
+ Currently, the B-tree, GiST and SP-GiST index access methods support
+ this feature. In these indexes, the values of columns listed
in the <literal>INCLUDE</literal> clause are included in leaf tuples
which correspond to heap tuples, but are not included in upper-level
index entries used for tree navigation.
diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index 0e04a08679..868a140a6a 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -240,6 +240,14 @@
inner tuples that are passed through to reach the leaf level.
</para>
+ <para>
+ In case when <acronym>SP-GiST</acronym> index is created with
+ <literal>INCLUDE</literal> clause i.e. covering index, leaf tuples also
+ contain data from included columns. This data is stored uncompressed and can have
+ data types without any SP-GiST operator class.
+
+ </para>
+
<para>
Inner tuples are more complex, since they are branching points in the
search tree. Each inner tuple contains a set of one or more
diff --git a/src/backend/access/spgist/README b/src/backend/access/spgist/README
index b55b073832..55b515f03d 100644
--- a/src/backend/access/spgist/README
+++ b/src/backend/access/spgist/README
@@ -73,9 +73,22 @@ Leaf tuple consists of:
Example:
radix tree - the rest of string (postfix)
quad and k-d tree - the point itself
-
ItemPointer to the heap
-
+ nextOffset number of next leaf tuple in a chain on a leaf page
+ optional nullmask for INCLUDE columns
+ optional INCLUDE columns values
+
+Leaf tuple layout changed since PostgreSQL version 14 to support INCLUDE
+columns but in a way that doesn't change the header and the key value
+placement in a tuple. So indexes created earlier remain fully supported.
+
+Also it is intended to be laid out with minimum possible gaps to make index
+smaller. I.e. first header of 12 bytes, then a key value starting from
+maxalign boundary, then just immediately nulls mask bytes, then INCLUDE
+attributes each starting from its typealign boundary. So in many cases,
+nullmask is stored free of charge and tuple occupy minimum possible space
+(with exception of gap before key value which starts from maxalign for
+compatibility).
NULLS HANDLING
@@ -90,6 +103,10 @@ Insertions and searches in the nulls tree do not use any of the
opclass-supplied functions, but just use hardwired logic comparable to
AllTheSame cases in the normal tree.
+For INCLUDE attributes nulls are handled in ordinary per leaf-tuple way i.e.
+if null mask presence bit in a header is set, nullmask is added just after
+key value before the first INCLUDE attribute. Note that nullmask presence
+bit and nullmask itself apply only to INCLUDE attributes.
INSERTION ALGORITHM
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 934d65b89f..335bbdb9dc 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -22,7 +22,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
-
+#include "access/htup_details.h"
/*
* SPPageDesc tracks all info about a page we are inserting into. In some
@@ -220,7 +220,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
SpGistBlockIsRoot(current->blkno))
{
/* Tuple is not part of a chain */
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple, InvalidOffsetNumber);
current->offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -253,7 +253,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
PageGetItemId(current->page, current->offnum));
if (head->tupstate == SPGIST_LIVE)
{
- leafTuple->nextOffset = head->nextOffset;
+ SGLT_SET_OFFSET(leafTuple, SGLT_GET_OFFSET(head));
offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -264,14 +264,14 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
*/
head = (SpGistLeafTuple) PageGetItem(current->page,
PageGetItemId(current->page, current->offnum));
- head->nextOffset = offnum;
+ SGLT_SET_OFFSET(head, offnum);
xlrec.offnumLeaf = offnum;
xlrec.offnumHeadLeaf = current->offnum;
}
else if (head->tupstate == SPGIST_DEAD)
{
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple, InvalidOffsetNumber);
PageIndexTupleDelete(current->page, current->offnum);
if (PageAddItem(current->page,
(Item) leafTuple, leafTuple->size,
@@ -362,13 +362,13 @@ checkSplitConditions(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it) == InvalidOffsetNumber);
/* Don't count it in result, because it won't go to other page */
}
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it);
}
*nToSplit = n;
@@ -437,7 +437,7 @@ moveLeafs(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it) == InvalidOffsetNumber);
/* We don't want to move it, so don't count it in size */
toDelete[nDelete] = i;
nDelete++;
@@ -446,7 +446,7 @@ moveLeafs(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it);
}
/* Find a leaf page that will hold them */
@@ -475,7 +475,7 @@ moveLeafs(Relation index, SpGistState *state,
* don't care). We're modifying the tuple on the source page
* here, but it's okay since we're about to delete it.
*/
- it->nextOffset = r;
+ SGLT_SET_OFFSET(it, r);
r = SpGistPageAddNewItem(state, npage, (Item) it, it->size,
&startOffset, false);
@@ -490,7 +490,7 @@ moveLeafs(Relation index, SpGistState *state,
}
/* add the new tuple as well */
- newLeafTuple->nextOffset = r;
+ SGLT_SET_OFFSET(newLeafTuple, r);
r = SpGistPageAddNewItem(state, npage,
(Item) newLeafTuple, newLeafTuple->size,
&startOffset, false);
@@ -709,6 +709,11 @@ doPickSplit(Relation index, SpGistState *state,
int nToDelete,
nToInsert,
maxToInclude;
+ Datum *leafChainDatums;
+ bool *leafChainIsnulls;
+ const int natts = IndexRelationGetNumberOfAttributes(index);
+ int chainStoreIndex; /* Index for start of datums/isnulls for a
+ current chain item */
in.level = level;
@@ -723,14 +728,16 @@ doPickSplit(Relation index, SpGistState *state,
toInsert = (OffsetNumber *) palloc(sizeof(OffsetNumber) * n);
newLeafs = (SpGistLeafTuple *) palloc(sizeof(SpGistLeafTuple) * n);
leafPageSelect = (uint8 *) palloc(sizeof(uint8) * n);
-
STORE_STATE(state, xlrec.stateSrc);
+ leafChainDatums = (Datum *) palloc(n * natts * sizeof(Datum));
+ leafChainIsnulls = (bool *) palloc(n * natts * sizeof(bool));
+
/*
- * Form list of leaf tuples which will be distributed as split result;
- * also, count up the amount of space that will be freed from current.
- * (Note that in the non-root case, we won't actually delete the old
- * tuples, only replace them with redirects or placeholders.)
+ * Collect leaf tuples which will be distributed as split result; also,
+ * count up the amount of space that will be freed from current. (Note
+ * that in the non-root case, we won't actually delete the old tuples,
+ * only replace them with redirects or placeholders.)
*
* Note: the SGLTDATUM calls here are safe even when dealing with a nulls
* page. For a pass-by-value data type we will fetch a word that must
@@ -738,7 +745,15 @@ doPickSplit(Relation index, SpGistState *state,
* tuples must have size at least SGDTSIZE). For a pass-by-reference type
* we are just computing a pointer that isn't going to get dereferenced.
* So it's not worth guarding the calls with isNulls checks.
+ *
+ * Datums and isnulls of all leaf tuple attributes in the chain are
+ * collected into 2-d arrays: (number of tuples in the chain) x (number of
+ * attributes) The first attribute is key, the other - INCLUDE attributes (if
+ * any). After picksplit we need to form new leaf tuples as the key attribute
+ * length can change which can affect the alignment of every INCLUDE
+ * attribute.
*/
+
nToInsert = 0;
nToDelete = 0;
spaceToDelete = 0;
@@ -759,6 +774,9 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+ chainStoreIndex = nToInsert * natts;
+ spgDeformLeafTuple(it, state, &leafChainDatums[chainStoreIndex],
+ &leafChainIsnulls[chainStoreIndex], isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -784,6 +802,9 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+ chainStoreIndex = nToInsert * natts;
+ spgDeformLeafTuple(it, state, &leafChainDatums[chainStoreIndex],
+ &leafChainIsnulls[chainStoreIndex], isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -795,7 +816,7 @@ doPickSplit(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it) == InvalidOffsetNumber);
toDelete[nToDelete] = i;
nToDelete++;
/* replacing it with redirect will save no space */
@@ -803,7 +824,7 @@ doPickSplit(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it);
}
}
in.nTuples = nToInsert;
@@ -816,10 +837,17 @@ doPickSplit(Relation index, SpGistState *state,
*/
in.datums[in.nTuples] = SGLTDATUM(newLeafTuple, state);
heapPtrs[in.nTuples] = newLeafTuple->heapPtr;
+ chainStoreIndex = in.nTuples * natts;
+ spgDeformLeafTuple(newLeafTuple, state, &leafChainDatums[chainStoreIndex],
+ &leafChainIsnulls[chainStoreIndex], isNulls);
in.nTuples++;
memset(&out, 0, sizeof(out));
+ /*
+ * Process collected key values of tuples from the chain. Included values
+ * are used to build fresh leaf tuples unchanged.
+ */
if (!isNulls)
{
/*
@@ -837,9 +865,13 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
+ chainStoreIndex = i * natts;
+ leafChainDatums[chainStoreIndex] = (Datum) out.leafTupleDatums[i];
+ leafChainIsnulls[chainStoreIndex] = false;
+
newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- out.leafTupleDatums[i],
- false);
+ &leafChainDatums[chainStoreIndex],
+ &leafChainIsnulls[chainStoreIndex]);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -860,9 +892,16 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
+ /*
+ * Nulls tree can contain only null key values.
+ */
+ chainStoreIndex = i * natts;
+ leafChainDatums[chainStoreIndex] = (Datum) 0;
+ leafChainIsnulls[chainStoreIndex] = true;
+
newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- (Datum) 0,
- true);
+ &leafChainDatums[chainStoreIndex],
+ &leafChainIsnulls[chainStoreIndex]);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -1196,10 +1235,10 @@ doPickSplit(Relation index, SpGistState *state,
if (ItemPointerIsValid(&nodes[n]->t_tid))
{
Assert(ItemPointerGetBlockNumber(&nodes[n]->t_tid) == leafBlock);
- it->nextOffset = ItemPointerGetOffsetNumber(&nodes[n]->t_tid);
+ SGLT_SET_OFFSET(it, ItemPointerGetOffsetNumber(&nodes[n]->t_tid));
}
else
- it->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(it, InvalidOffsetNumber);
/* Insert it on page */
newoffset = SpGistPageAddNewItem(state, BufferGetPage(leafBuffer),
@@ -1889,67 +1928,83 @@ spgSplitNodeAction(Relation index, SpGistState *state,
*/
bool
spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull)
+ ItemPointer heapPtr, Datum *datum, bool *isnull)
{
int level = 0;
- Datum leafDatum;
+ Datum *leafDatum;
int leafSize;
SPPageDesc current,
parent;
FmgrInfo *procinfo = NULL;
+ int i;
/*
* Look up FmgrInfo of the user-defined choose function once, to save
* cycles in the loop below.
*/
- if (!isnull)
+ if (!isnull[spgKeyColumn])
procinfo = index_getprocinfo(index, 1, SPGIST_CHOOSE_PROC);
/*
* Prepare the leaf datum to insert.
- *
+ */
+
+ leafDatum = (Datum *) palloc0(sizeof(Datum) * (IndexRelationGetNumberOfAttributes(index)));
+
+ /*
* If an optional "compress" method is provided, then call it to form the
- * leaf datum from the input datum. Otherwise store the input datum as
- * is. Since we don't use index_form_tuple in this AM, we have to make
- * sure value to be inserted is not toasted; FormIndexDatum doesn't
- * guarantee that. But we assume the "compress" method to return an
- * untoasted value.
+ * key datum from the input datum. Otherwise, store the input datum as is.
+ * Since we don't use index_form_tuple in this AM, we have to make sure
+ * value to be inserted is not toasted; FormIndexDatum doesn't guarantee
+ * that. But we assume the "compress" method to return an untoasted
+ * value.
*/
- if (!isnull)
+ if (!isnull[spgKeyColumn])
{
if (OidIsValid(index_getprocid(index, 1, SPGIST_COMPRESS_PROC)))
{
FmgrInfo *compressProcinfo = NULL;
compressProcinfo = index_getprocinfo(index, 1, SPGIST_COMPRESS_PROC);
- leafDatum = FunctionCall1Coll(compressProcinfo,
- index->rd_indcollation[0],
- datum);
+ leafDatum[spgKeyColumn] = FunctionCall1Coll(compressProcinfo,
+ index->rd_indcollation[0],
+ datum[spgKeyColumn]);
}
else
{
Assert(state->attLeafType.type == state->attType.type);
if (state->attType.attlen == -1)
- leafDatum = PointerGetDatum(PG_DETOAST_DATUM(datum));
+ leafDatum[spgKeyColumn] = PointerGetDatum(PG_DETOAST_DATUM(datum[spgKeyColumn]));
else
- leafDatum = datum;
+ leafDatum[spgKeyColumn] = datum[spgKeyColumn];
}
}
else
- leafDatum = (Datum) 0;
+ leafDatum[spgKeyColumn] = (Datum) 0;
+
+ for (i = 1; i < IndexRelationGetNumberOfAttributes(index); i++)
+ {
+ if (!isnull[i])
+ {
+ if (TupleDescAttr(state->includeTupdesc, i - 1)->attlen == -1)
+ leafDatum[i] = PointerGetDatum(PG_DETOAST_DATUM(datum[i]));
+ else
+ leafDatum[i] = datum[i];
+ }
+ else
+ leafDatum[i] = (Datum) 0;
+ }
+
/*
- * Compute space needed for a leaf tuple containing the given datum.
+ * Compute space needed on a page for a leaf tuple containing the given
+ * datum.
*
* If it isn't gonna fit, and the opclass can't reduce the datum size by
* suffixing, bail out now rather than getting into an endless loop.
*/
- if (!isnull)
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
- else
- leafSize = SGDTSIZE + sizeof(ItemIdData);
+ leafSize = spgLeafTupleSize(state, leafDatum, isnull) + sizeof(ItemIdData);
if (leafSize > SPGIST_PAGE_CAPACITY && !state->config.longValuesOK)
ereport(ERROR,
@@ -1961,7 +2016,7 @@ spgdoinsert(Relation index, SpGistState *state,
errhint("Values larger than a buffer page cannot be indexed.")));
/* Initialize "current" to the appropriate root page */
- current.blkno = isnull ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
+ current.blkno = isnull[spgKeyColumn] ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
current.buffer = InvalidBuffer;
current.page = NULL;
current.offnum = FirstOffsetNumber;
@@ -1995,7 +2050,7 @@ spgdoinsert(Relation index, SpGistState *state,
*/
current.buffer =
SpGistGetBuffer(index,
- GBUF_LEAF | (isnull ? GBUF_NULLS : 0),
+ GBUF_LEAF | (isnull[spgKeyColumn] ? GBUF_NULLS : 0),
Min(leafSize, SPGIST_PAGE_CAPACITY),
&isNew);
current.blkno = BufferGetBlockNumber(current.buffer);
@@ -2037,7 +2092,7 @@ spgdoinsert(Relation index, SpGistState *state,
current.page = BufferGetPage(current.buffer);
/* should not arrive at a page of the wrong type */
- if (isnull ? !SpGistPageStoresNulls(current.page) :
+ if (isnull[spgKeyColumn] ? !SpGistPageStoresNulls(current.page) :
SpGistPageStoresNulls(current.page))
elog(ERROR, "SPGiST index page %u has wrong nulls flag",
current.blkno);
@@ -2054,7 +2109,7 @@ spgdoinsert(Relation index, SpGistState *state,
{
/* it fits on page, so insert it and we're done */
addLeafTuple(index, state, leafTuple,
- ¤t, &parent, isnull, isNew);
+ ¤t, &parent, isnull[spgKeyColumn], isNew);
break;
}
else if ((sizeToSplit =
@@ -2068,14 +2123,14 @@ spgdoinsert(Relation index, SpGistState *state,
* chain to another leaf page rather than splitting it.
*/
Assert(!isNew);
- moveLeafs(index, state, ¤t, &parent, leafTuple, isnull);
+ moveLeafs(index, state, ¤t, &parent, leafTuple, isnull[spgKeyColumn]);
break; /* we're done */
}
else
{
/* picksplit */
if (doPickSplit(index, state, ¤t, &parent,
- leafTuple, level, isnull, isNew))
+ leafTuple, level, isnull[spgKeyColumn], isNew))
break; /* doPickSplit installed new tuples */
/* leaf tuple will not be inserted yet */
@@ -2110,8 +2165,8 @@ spgdoinsert(Relation index, SpGistState *state,
innerTuple = (SpGistInnerTuple) PageGetItem(current.page,
PageGetItemId(current.page, current.offnum));
- in.datum = datum;
- in.leafDatum = leafDatum;
+ in.datum = datum[spgKeyColumn];
+ in.leafDatum = leafDatum[spgKeyColumn];
in.level = level;
in.allTheSame = innerTuple->allTheSame;
in.hasPrefix = (innerTuple->prefixSize > 0);
@@ -2121,7 +2176,7 @@ spgdoinsert(Relation index, SpGistState *state,
memset(&out, 0, sizeof(out));
- if (!isnull)
+ if (!isnull[spgKeyColumn])
{
/* use user-defined choose method */
FunctionCall2Coll(procinfo,
@@ -2158,11 +2213,11 @@ spgdoinsert(Relation index, SpGistState *state,
/* Adjust level as per opclass request */
level += out.result.matchNode.levelAdd;
/* Replace leafDatum and recompute leafSize */
- if (!isnull)
+ if (!isnull[spgKeyColumn])
{
- leafDatum = out.result.matchNode.restDatum;
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
+ leafDatum[spgKeyColumn] = out.result.matchNode.restDatum;
+ leafSize = spgLeafTupleSize(state, leafDatum, isnull) +
+ sizeof(ItemIdData);
}
/*
@@ -2227,6 +2282,6 @@ spgdoinsert(Relation index, SpGistState *state,
SpGistSetLastUsedPage(index, parent.buffer);
UnlockReleaseBuffer(parent.buffer);
}
-
+ pfree(leafDatum);
return true;
}
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4508a2b92..b54ae85f6e 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -55,8 +55,7 @@ spgistBuildCallback(Relation index, ItemPointer tid, Datum *values,
* lock on some buffer. So we need to be willing to retry. We can flush
* any temp data when retrying.
*/
- while (!spgdoinsert(index, &buildstate->spgstate, tid,
- *values, *isnull))
+ while (!spgdoinsert(index, &buildstate->spgstate, tid, values, isnull))
{
MemoryContextReset(buildstate->tmpCtx);
}
@@ -226,7 +225,7 @@ spginsert(Relation index, Datum *values, bool *isnull,
* to avoid cumulative memory consumption. That means we also have to
* redo initSpGistState(), but it's cheap enough not to matter.
*/
- while (!spgdoinsert(index, &spgstate, ht_ctid, *values, *isnull))
+ while (!spgdoinsert(index, &spgstate, ht_ctid, values, isnull))
{
MemoryContextReset(insertCtx);
initSpGistState(&spgstate, index);
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 4d506bfb9a..aff130f78a 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -28,7 +28,8 @@
typedef void (*storeRes_func) (SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isNull, bool recheck,
- bool recheckDistances, double *distances);
+ bool recheckDistances, double *distances,
+ SpGistLeafTuple leafTuple);
/*
* Pairing heap comparison function for the SpGistSearchItem queue.
@@ -88,6 +89,9 @@ spgFreeSearchItem(SpGistScanOpaque so, SpGistSearchItem *item)
if (item->traversalValue)
pfree(item->traversalValue);
+ if (item->isLeaf && item->leafTuple)
+ pfree(item->leafTuple);
+
pfree(item);
}
@@ -134,6 +138,8 @@ spgAddStartItem(SpGistScanOpaque so, bool isnull)
startEntry->recheck = false;
startEntry->recheckDistances = false;
+ startEntry->leafTuple = NULL;
+
spgAddSearchItemToQueue(so, startEntry);
}
@@ -438,14 +444,30 @@ spgendscan(IndexScanDesc scan)
* Leaf SpGistSearchItem constructor, called in queue context
*/
static SpGistSearchItem *
-spgNewHeapItem(SpGistScanOpaque so, int level, ItemPointer heapPtr,
+spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
Datum leafValue, bool recheck, bool recheckDistances,
bool isnull, double *distances)
{
SpGistSearchItem *item = spgAllocSearchItem(so, isnull, distances);
+ /*
+ * If there are INCLUDE attributes search item in the queue should contain
+ * them.
+ */
+ if (so->state.includeTupdesc)
+ {
+ Assert(so->state.includeTupdesc->natts);
+
+ item->leafTuple = palloc(leafTuple->size);
+ memcpy(item->leafTuple, leafTuple, leafTuple->size);
+ }
+ else
+ {
+ item->leafTuple = NULL;
+ }
+
item->level = level;
- item->heapPtr = *heapPtr;
+ item->heapPtr = leafTuple->heapPtr;
/* copy value to queue cxt out of tmp cxt */
item->value = isnull ? (Datum) 0 :
datumCopy(leafValue, so->state.attLeafType.attbyval,
@@ -503,6 +525,8 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
in.returnData = so->want_itup;
in.leafDatum = SGLTDATUM(leafTuple, &so->state);
+
+
out.leafValue = (Datum) 0;
out.recheck = false;
out.distances = NULL;
@@ -528,7 +552,7 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
/* the scan is ordered -> add the item to the queue */
MemoryContext oldCxt = MemoryContextSwitchTo(so->traversalCxt);
SpGistSearchItem *heapItem = spgNewHeapItem(so, item->level,
- &leafTuple->heapPtr,
+ leafTuple,
leafValue,
recheck,
recheckDistances,
@@ -543,8 +567,10 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
{
/* non-ordered scan, so report the item right away */
Assert(!recheckDistances);
+
storeRes(so, &leafTuple->heapPtr, leafValue, isnull,
- recheck, false, NULL);
+ recheck, false, NULL, leafTuple);
+
*reportedSome = true;
}
}
@@ -736,7 +762,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
/* dead tuple should be first in chain */
Assert(offset == ItemPointerGetOffsetNumber(&item->heapPtr));
/* No live entries on this page */
- Assert(leafTuple->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(leafTuple) == InvalidOffsetNumber);
return SpGistBreakOffsetNumber;
}
}
@@ -750,7 +776,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes);
- return leafTuple->nextOffset;
+ return SGLT_GET_OFFSET(leafTuple);
}
/*
@@ -782,8 +808,8 @@ redirect:
{
/* We store heap items in the queue only in case of ordered search */
Assert(so->numberOfNonNullOrderBys > 0);
- storeRes(so, &item->heapPtr, item->value, item->isNull,
- item->recheck, item->recheckDistances, item->distances);
+ storeRes(so, &item->heapPtr, item->value, item->isNull, item->recheck,
+ item->recheckDistances, item->distances, item->leafTuple);
reportedSome = true;
}
else
@@ -877,7 +903,7 @@ redirect:
static void
storeBitmap(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *distances)
+ double *distances, SpGistLeafTuple leafTuple)
{
Assert(!recheckDistances && !distances);
tbm_add_tuples(so->tbm, heapPtr, 1, recheck);
@@ -904,7 +930,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
static void
storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *nonNullDistances)
+ double *nonNullDistances, SpGistLeafTuple leafTuple)
{
Assert(so->nPtrs < MaxIndexTuplesPerPage);
so->heapPtrs[so->nPtrs] = *heapPtr;
@@ -949,9 +975,38 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
* Reconstruct index data. We have to copy the datum out of the temp
* context anyway, so we may as well create the tuple here.
*/
- so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
- &leafValue,
- &isnull);
+ if (so->state.includeTupdesc)
+ {
+ /* Add INCLUDE attributes */
+ Datum *leafDatums;
+ bool *leafIsnulls;
+
+ Assert(so->state.includeTupdesc->natts);
+
+ leafDatums = (Datum *) palloc(sizeof(Datum) * (so->state.includeTupdesc->natts + 1));
+ leafIsnulls = (bool *) palloc(sizeof(bool) * (so->state.includeTupdesc->natts + 1));
+
+ spgDeformLeafTuple(leafTuple, &so->state, leafDatums, leafIsnulls, isnull);
+
+ /*
+ * override key value extracted from LeafTuple in case we've
+ * reconstructed it already
+ */
+ leafDatums[spgKeyColumn] = leafValue;
+ leafIsnulls[spgKeyColumn] = isnull;
+
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ leafDatums,
+ leafIsnulls);
+ pfree(leafDatums);
+ pfree(leafIsnulls);
+ }
+ else
+ {
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ &leafValue,
+ &isnull);
+ }
}
so->nPtrs++;
}
@@ -1019,6 +1074,10 @@ spgcanreturn(Relation index, int attno)
{
SpGistCache *cache;
+ /* INCLUDE attributes can always be fetched for index-only scans */
+ if (attno > 1)
+ return true;
+
/* We can do it if the opclass config function says so */
cache = spgGetCache(index);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 0efe05e552..bffe945843 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -31,7 +31,18 @@
#include "utils/index_selfuncs.h"
#include "utils/lsyscache.h"
#include "utils/syscache.h"
+#include "access/itup.h"
+#include "access/detoast.h"
+#include "access/toast_internals.h"
+#include "access/heaptoast.h"
+#include "utils/expandeddatum.h"
+/* Does att's datatype allow packing into the 1-byte-header varlena format? */
+#define ATT_IS_PACKABLE(att) \
+ ((att)->attlen == -1 && (att)->attstorage != TYPSTORAGE_PLAIN)
+
+Size spgIncludedDataSize(TupleDesc tupleDesc, Datum *values,
+ bool *isnull, Size start);
/*
* SP-GiST handler function: return IndexAmRoutine with access method parameters
@@ -49,7 +60,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amcanorderbyop = true;
amroutine->amcanbackward = false;
amroutine->amcanunique = false;
- amroutine->amcanmulticol = false;
+ amroutine->amcanmulticol = true;
amroutine->amoptionalkey = true;
amroutine->amsearcharray = false;
amroutine->amsearchnulls = true;
@@ -57,7 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amclusterable = false;
amroutine->ampredlocks = false;
amroutine->amcanparallel = false;
- amroutine->amcaninclude = false;
+ amroutine->amcaninclude = true;
amroutine->amusemaintenanceworkmem = false;
amroutine->amparallelvacuumoptions =
VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
@@ -116,14 +127,21 @@ spgGetCache(Relation index)
cache = MemoryContextAllocZero(index->rd_indexcxt,
sizeof(SpGistCache));
- /* SPGiST doesn't support multi-column indexes */
- Assert(index->rd_att->natts == 1);
+ /*
+ * SPGiST should have one key column and can also have INCLUDE
+ * columns
+ */
+ if (IndexRelationGetNumberOfKeyAttributes(index) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("SPGiST index can have only one key column")));
/*
- * Get the actual data type of the indexed column from the index
- * tupdesc. We pass this to the opclass config function so that
- * polymorphic opclasses are possible.
+ * Get the actual data type of the key column from the index tupdesc.
+ * We pass this to the opclass config function so that polymorphic
+ * opclasses are possible.
*/
+
atttype = TupleDescAttr(index->rd_att, 0)->atttypid;
/* Call the config function to get config info for the opclass */
@@ -156,6 +174,7 @@ spgGetCache(Relation index)
fillTypeDesc(&cache->attPrefixType, cache->config.prefixType);
fillTypeDesc(&cache->attLabelType, cache->config.labelType);
+
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
@@ -177,7 +196,23 @@ spgGetCache(Relation index)
/* assume it's up to date */
cache = (SpGistCache *) index->rd_amcache;
}
+ /* Form descriptor for INCLUDE columns if any */
+ if (IndexRelationGetNumberOfAttributes(index) > 1)
+ {
+ int i;
+
+ cache->includeTupdesc = CreateTemplateTupleDesc(
+ IndexRelationGetNumberOfAttributes(index) - 1);
+ for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; i++)
+ {
+ TupleDescInitEntry(cache->includeTupdesc, i + 1, NULL,
+ TupleDescAttr(index->rd_att, i + 1)->atttypid,
+ -1, 0);
+ }
+ }
+ else
+ cache->includeTupdesc = NULL;
return cache;
}
@@ -190,6 +225,7 @@ initSpGistState(SpGistState *state, Relation index)
/* Get cached static information about index */
cache = spgGetCache(index);
+ state->includeTupdesc = cache->includeTupdesc;
state->config = cache->config;
state->attType = cache->attType;
state->attLeafType = cache->attLeafType;
@@ -603,8 +639,8 @@ spgoptions(Datum reloptions, bool validate)
/*
* Get the space needed to store a non-null datum of the indicated type.
- * Note the result is already rounded up to a MAXALIGN boundary.
- * Also, we follow the SPGiST convention that pass-by-val types are
+ * Note the result is not maxaligned and this should be done by the caller if
+ * needed. Also, we follow the SPGiST convention that pass-by-val types are
* just stored in their Datum representation (compare memcpyDatum).
*/
unsigned int
@@ -619,7 +655,7 @@ SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum)
else
size = VARSIZE_ANY(datum);
- return MAXALIGN(size);
+ return size;
}
/*
@@ -642,36 +678,205 @@ memcpyDatum(void *target, SpGistTypeDesc *att, Datum datum)
}
/*
- * Construct a leaf tuple containing the given heap TID and datum value
+ * Private version of heap_compute_data_size with start address not
+ * at MAXALIGN boundary. The reason is that start address (and alignment)
+ * influence alignment of each of next values and overall size of INCLUDE
+ * data area in SpGiST leaf tuple. MAXALINGing first INCLUDE attribute is
+ * avoided for not to introduce unnecessary gap before it.
+ */
+Size
+spgIncludedDataSize(TupleDesc tupleDesc,
+ Datum *values,
+ bool *isnull, Size start)
+{
+ Size data_length = 0;
+ int i;
+ int numberOfAttributes = tupleDesc->natts;
+
+ data_length = start;
+ for (i = 0; i < numberOfAttributes; i++)
+ {
+ Datum val;
+ Form_pg_attribute atti;
+
+ if (isnull[i])
+ continue;
+
+ val = values[i];
+ atti = TupleDescAttr(tupleDesc, i);
+
+ if (ATT_IS_PACKABLE(atti) &&
+ VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
+ {
+ /*
+ * we're anticipating converting to a short varlena header, so
+ * adjust length and don't count any alignment
+ */
+ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
+ }
+ else if (atti->attlen == -1 &&
+ VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+ {
+ /*
+ * we want to flatten the expanded value so that the constructed
+ * tuple doesn't depend on it
+ */
+ data_length = att_align_nominal(data_length, atti->attalign);
+ data_length += EOH_get_flat_size(DatumGetEOHP(val));
+ }
+ else
+ {
+ data_length = att_align_datum(data_length, atti->attalign,
+ atti->attlen, val);
+ data_length = att_addlength_datum(data_length, atti->attlen,
+ val);
+ }
+ }
+ return data_length - start;
+}
+
+/* Calculate overall leaf tuple size. SGLTHDRSZ is MAXALIGNed for backward
+ * compatibility and there might be a gap between header and key data. After
+ * key data there are no such gaps more than is is necessary for each value
+ * alignment. Overall result is MAXALIGNed which is anyway unavoidable
+ * when placing a tuple on a page.
+ */
+unsigned int
+spgLeafTupleSize(SpGistState *state, Datum *datum, bool *isnull)
+{
+ /* compute space needed, nullmask size and offset for INCLUDE attributes */
+ unsigned int size = SGLTHDRSZ;
+ unsigned int i;
+
+ if (!isnull[spgKeyColumn])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[spgKeyColumn]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+ /* nullmask size */
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ size += (state->includeTupdesc->natts / 8) + 1;
+ break;
+ }
+ }
+ /* overall INCLUDE attributes size each with added proper alignment. */
+ size += spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ }
+ return MAXALIGN(size);
+}
+
+/*
+ * Construct a leaf tuple containing the given heap TID, key data and INCLUDE
+ * columns data. Key data starts from MAXALIGN boundary for backward compatibility.
+ * Nullmask apply only to INCLUDE attributes and is placed just after key data if
+ * there is at least one NULL among INCLUDE attributes. It doesn't need alignment.
+ * Then all INCLUDE columns data follow aligned by their typealign-s.
*/
SpGistLeafTuple
spgFormLeafTuple(SpGistState *state, ItemPointer heapPtr,
- Datum datum, bool isnull)
+ Datum *datum, bool *isnull)
{
SpGistLeafTuple tup;
- unsigned int size;
+ unsigned int size = SGLTHDRSZ;
+ unsigned int include_offset = 0;
+ unsigned int nullmask_size = 0;
+ unsigned int data_offset = 0;
+ unsigned int data_size = 0;
+ uint16 tupmask = 0;
+ int i;
- /* compute space needed (note result is already maxaligned) */
- size = SGLTHDRSZ;
- if (!isnull)
- size += SpGistGetTypeSize(&state->attLeafType, datum);
+ /*
+ * Calculate space needed. If there are INCLUDE attributes also calculate
+ * sizes and offsets needed for heap_fill_tuple
+ */
+ if (!isnull[spgKeyColumn])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[spgKeyColumn]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = size;
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ nullmask_size = (state->includeTupdesc->natts / 8) + 1;
+ size += nullmask_size;
+ break;
+ }
+ }
+
+ /*
+ * Alignment of all INCLUDE attributes is counted inside data_size.
+ * data_offset itself is not aligned.
+ */
+ data_size = spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ data_offset = size;
+
+ size += data_size;
+ }
/*
- * Ensure that we can replace the tuple with a dead tuple later. This
- * test is unnecessary when !isnull, but let's be safe.
+ * Ensure that we can replace the tuple with a dead tuple later. This
+ * test is unnecessary when !isnull[spgKeyColumn], but let's be safe.
*/
if (size < SGDTSIZE)
size = SGDTSIZE;
/* OK, form the tuple */
- tup = (SpGistLeafTuple) palloc0(size);
+ tup = (SpGistLeafTuple) palloc0(MAXALIGN(size));
- tup->size = size;
- tup->nextOffset = InvalidOffsetNumber;
+ tup->size = MAXALIGN(size);
+ SGLT_SET_OFFSET(tup, InvalidOffsetNumber);
tup->heapPtr = *heapPtr;
- if (!isnull)
- memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum);
+ if (!isnull[spgKeyColumn])
+ memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum[spgKeyColumn]);
+
+ /* Add INCLUDE columns data to leaf tuple if any. */
+ if (state->includeTupdesc)
+ {
+ /*
+ * The start of INCLUDE attributes tuple (include_offset) is next
+ * byte after end of a key value and is not required to be aligned.
+ * Nullmask is included without alignment and values alignment are
+ * done by heap_fill_tuple() automatically.
+ */
+ heap_fill_tuple(state->includeTupdesc, datum + 1, isnull + 1,
+ (char *) tup + data_offset,
+ data_size, &tupmask,
+ (nullmask_size ? (bits8 *) tup + include_offset : NULL));
+
+ if (nullmask_size)
+ SGLT_SET_CONTAINSNULLMASK(tup, true);
+
+ /*
+ * We do this because heap_fill_tuple wants to initialize a "tupmask"
+ * which is used for HeapTuples, but the only relevant info is the
+ * "has variable attributes" field. We have already set the hasnull
+ * bit above.
+ */
+ if (tupmask & HEAP_HASVARWIDTH)
+ SGLT_SET_CONTAINSVARATT(tup, true);
+ }
return tup;
}
@@ -688,10 +893,10 @@ spgFormNodeTuple(SpGistState *state, Datum label, bool isnull)
unsigned int size;
unsigned short infomask = 0;
- /* compute space needed (note result is already maxaligned) */
+ /* compute space needed */
size = SGNTHDRSZ;
if (!isnull)
- size += SpGistGetTypeSize(&state->attLabelType, label);
+ size += MAXALIGN(SpGistGetTypeSize(&state->attLabelType, label));
/*
* Here we make sure that the size will fit in the field reserved for it
@@ -735,7 +940,7 @@ spgFormInnerTuple(SpGistState *state, bool hasPrefix, Datum prefix,
/* Compute size needed */
if (hasPrefix)
- prefixSize = SpGistGetTypeSize(&state->attPrefixType, prefix);
+ prefixSize = MAXALIGN(SpGistGetTypeSize(&state->attPrefixType, prefix));
else
prefixSize = 0;
@@ -1046,3 +1251,133 @@ spgproperty(Oid index_oid, int attno,
return true;
}
+
+/*
+ * Convert an SpGist tuple into palloc'd Datum/isnull arrays.
+ *
+ */
+void
+spgDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state, Datum *datum, bool *isnull,
+ bool key_isnull)
+{
+ unsigned int include_offset; /* offset of INCLUDE data */
+ int off;
+ bits8 *nullmask_ptr = NULL; /* ptr to null bitmap in tuple */
+ char *tp;
+ bool slow = false; /* can we use/set attcacheoff? */
+ int i;
+
+ if (key_isnull)
+ {
+ datum[spgKeyColumn] = (Datum) 0;
+ isnull[spgKeyColumn] = true;
+ }
+ else
+ {
+ datum[spgKeyColumn] = SGLTDATUM(tup, state);
+ isnull[spgKeyColumn] = false;
+ }
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
+
+ include_offset = key_isnull ? SGLTHDRSZ : SGLTHDRSZ + SpGistGetTypeSize(&state->attLeafType, datum[spgKeyColumn]);
+
+ tp = (char *) tup;
+ off = include_offset;
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup))
+ {
+ nullmask_ptr = (bits8 *) tp + include_offset;
+ off += (state->includeTupdesc->natts) / 8 + 1;
+ }
+
+ if (state->attLeafType.attlen > 0 && !SGLT_GET_CONTAINSVARATT(tup) &&
+ !SGLT_GET_CONTAINSNULLMASK(tup))
+ /* can use attcacheoff for all attributes */
+ {
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ isnull[i] = false;
+ if (thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else
+ {
+ off = att_align_nominal(off, thisatt->attalign);
+ thisatt->attcacheoff = off;
+ }
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+ }
+ }
+ else
+
+ /*
+ * general case: can use cache until first null or varlen
+ * attribute
+ */
+ {
+ if (state->attLeafType.attlen <= 0)
+ slow = true; /* can't use attcacheoff at all */
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup))
+ {
+ if (att_isnull(i - 1, nullmask_ptr))
+ {
+ datum[i] = (Datum) 0;
+ isnull[i] = true;
+ slow = true; /* can't use attcacheoff anymore */
+ continue;
+ }
+ }
+
+ isnull[i] = false;
+
+ if (!slow && thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else if (thisatt->attlen == -1)
+ {
+ /*
+ * We can only cache the offset for a varlena attribute if
+ * the offset is already suitably aligned, so that there
+ * would be no pad bytes in any case: then the offset will
+ * be valid for either an aligned or unaligned value.
+ */
+ if (!slow && off == att_align_nominal(off, thisatt->attalign))
+ thisatt->attcacheoff = off;
+ else
+ {
+ off = att_align_pointer(off, thisatt->attalign, -1, tp + off);
+ slow = true;
+ }
+ }
+ else
+ {
+ /* not varlena, so safe to use att_align_nominal */
+ off = att_align_nominal(off, thisatt->attalign);
+
+ if (!slow)
+ thisatt->attcacheoff = off;
+ }
+
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+
+ if (thisatt->attlen <= 0)
+ slow = true; /* can't use attcacheoff anymore */
+ }
+ }
+ }
+}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index bd98707f3c..f23f9d0b1e 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -168,23 +168,28 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
/* Form predecessor map, too */
- if (lt->nextOffset != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt) != InvalidOffsetNumber)
{
/* paranoia about corrupted chain links */
- if (lt->nextOffset < FirstOffsetNumber ||
- lt->nextOffset > max ||
- predecessor[lt->nextOffset] != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt) < FirstOffsetNumber ||
+ SGLT_GET_OFFSET(lt) > max ||
+ predecessor[SGLT_GET_OFFSET(lt)] != InvalidOffsetNumber)
elog(ERROR, "inconsistent tuple chain links in page %u of index \"%s\"",
BufferGetBlockNumber(buffer),
RelationGetRelationName(index));
- predecessor[lt->nextOffset] = i;
+ predecessor[SGLT_GET_OFFSET(lt)] = i;
}
}
else if (lt->tupstate == SPGIST_REDIRECT)
{
SpGistDeadTuple dt = (SpGistDeadTuple) lt;
- Assert(dt->nextOffset == InvalidOffsetNumber);
+ /*
+ * Dead tuple nextOffset is allowed to have any values of two
+ * highest bits in case it is inherited from SpGistLeafTuple where
+ * these bits have their own meaning.
+ */
+ Assert(SGLT_GET_OFFSET(dt) == InvalidOffsetNumber);
Assert(ItemPointerIsValid(&dt->pointer));
/*
@@ -201,7 +206,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
else
{
- Assert(lt->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(lt) == InvalidOffsetNumber);
}
}
@@ -250,7 +255,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
prevLive = deletable[i] ? InvalidOffsetNumber : i;
/* scan down the chain ... */
- j = head->nextOffset;
+ j = SGLT_GET_OFFSET(head);
while (j != InvalidOffsetNumber)
{
SpGistLeafTuple lt;
@@ -301,7 +306,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
interveningDeletable = false;
}
- j = lt->nextOffset;
+ j = SGLT_GET_OFFSET(lt);
}
if (prevLive == InvalidOffsetNumber)
@@ -366,7 +371,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt, chainDest[i]);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 7be2291d07..bbc2b91abc 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -122,8 +122,8 @@ spgRedoAddLeaf(XLogReaderState *record)
head = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, xldata->offnumHeadLeaf));
- Assert(head->nextOffset == leafTupleHdr.nextOffset);
- head->nextOffset = xldata->offnumLeaf;
+ Assert(SGLT_GET_OFFSET(head) == SGLT_GET_OFFSET(&leafTupleHdr));
+ SGLT_SET_OFFSET(head, xldata->offnumLeaf);
}
}
else
@@ -822,7 +822,7 @@ spgRedoVacuumLeaf(XLogReaderState *record)
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt, chainDest[i]);
}
PageSetLSN(page, lsn);
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index 00b98ec6a0..84ef6095ea 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -22,13 +22,14 @@
#include "utils/geo_decls.h"
#include "utils/relcache.h"
-
typedef struct SpGistOptions
{
int32 varlena_header_; /* varlena header (do not touch directly!) */
int fillfactor; /* page fill factor in percent (0..100) */
} SpGistOptions;
+#define spgKeyColumn 0
+
#define SpGistGetFillFactor(relation) \
(AssertMacro(relation->rd_rel->relkind == RELKIND_INDEX && \
relation->rd_rel->relam == SPGIST_AM_OID), \
@@ -141,6 +142,7 @@ typedef struct SpGistState
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc; /* tuple descriptor of INCLUDE columns */
char *deadTupleStorage; /* workspace for spgFormDeadTuple */
@@ -148,104 +150,6 @@ typedef struct SpGistState
bool isBuild; /* true if doing index build */
} SpGistState;
-typedef struct SpGistSearchItem
-{
- pairingheap_node phNode; /* pairing heap node */
- Datum value; /* value reconstructed from parent or
- * leafValue if heaptuple */
- void *traversalValue; /* opclass-specific traverse value */
- int level; /* level of items on this page */
- ItemPointerData heapPtr; /* heap info, if heap tuple */
- bool isNull; /* SearchItem is NULL item */
- bool isLeaf; /* SearchItem is heap item */
- bool recheck; /* qual recheck is needed */
- bool recheckDistances; /* distance recheck is needed */
-
- /* array with numberOfOrderBys entries */
- double distances[FLEXIBLE_ARRAY_MEMBER];
-} SpGistSearchItem;
-
-#define SizeOfSpGistSearchItem(n_distances) \
- (offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
-
-/*
- * Private state of an index scan
- */
-typedef struct SpGistScanOpaqueData
-{
- SpGistState state; /* see above */
- pairingheap *scanQueue; /* queue of to be visited items */
- MemoryContext tempCxt; /* short-lived memory context */
- MemoryContext traversalCxt; /* single scan lifetime memory context */
-
- /* Control flags showing whether to search nulls and/or non-nulls */
- bool searchNulls; /* scan matches (all) null entries */
- bool searchNonNulls; /* scan matches (some) non-null entries */
-
- /* Index quals to be passed to opclass (null-related quals removed) */
- int numberOfKeys; /* number of index qualifier conditions */
- ScanKey keyData; /* array of index qualifier descriptors */
- int numberOfOrderBys; /* number of ordering operators */
- int numberOfNonNullOrderBys; /* number of ordering operators
- * with non-NULL arguments */
- ScanKey orderByData; /* array of ordering op descriptors */
- Oid *orderByTypes; /* array of ordering op return types */
- int *nonNullOrderByOffsets; /* array of offset of non-NULL
- * ordering keys in the original array */
- Oid indexCollation; /* collation of index column */
-
- /* Opclass defined functions: */
- FmgrInfo innerConsistentFn;
- FmgrInfo leafConsistentFn;
-
- /* Pre-allocated workspace arrays: */
- double *zeroDistances;
- double *infDistances;
-
- /* These fields are only used in amgetbitmap scans: */
- TIDBitmap *tbm; /* bitmap being filled */
- int64 ntids; /* number of TIDs passed to bitmap */
-
- /* These fields are only used in amgettuple scans: */
- bool want_itup; /* are we reconstructing tuples? */
- TupleDesc indexTupDesc; /* if so, tuple descriptor for them */
- int nPtrs; /* number of TIDs found on current page */
- int iPtr; /* index for scanning through same */
- ItemPointerData heapPtrs[MaxIndexTuplesPerPage]; /* TIDs from cur page */
- bool recheck[MaxIndexTuplesPerPage]; /* their recheck flags */
- bool recheckDistances[MaxIndexTuplesPerPage]; /* distance recheck
- * flags */
- HeapTuple reconTups[MaxIndexTuplesPerPage]; /* reconstructed tuples */
-
- /* distances (for recheck) */
- IndexOrderByDistance *distances[MaxIndexTuplesPerPage];
-
- /*
- * Note: using MaxIndexTuplesPerPage above is a bit hokey since
- * SpGistLeafTuples aren't exactly IndexTuples; however, they are larger,
- * so this is safe.
- */
-} SpGistScanOpaqueData;
-
-typedef SpGistScanOpaqueData *SpGistScanOpaque;
-
-/*
- * This struct is what we actually keep in index->rd_amcache. It includes
- * static configuration information as well as the lastUsedPages cache.
- */
-typedef struct SpGistCache
-{
- spgConfigOut config; /* filled in by opclass config method */
-
- SpGistTypeDesc attType; /* type of values to be indexed/restored */
- SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
- SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
- SpGistTypeDesc attLabelType; /* type of node label values */
-
- SpGistLUPCache lastUsedPages; /* local storage of last-used info */
-} SpGistCache;
-
-
/*
* SPGiST tuple types. Note: inner, leaf, and dead tuple structs
* must have the same tupstate field in the same position! Real inner and
@@ -305,8 +209,8 @@ typedef SpGistInnerTupleData *SpGistInnerTuple;
* SPGiST node tuple: one node within an inner tuple
*
* Node tuples use the same header as ordinary Postgres IndexTuples, but
- * we do not use a null bitmap, because we know there is only one column
- * so the INDEX_NULL_MASK bit suffices. Also, pass-by-value datums are
+ * we do not use a null bitmap, because we know there is only one key column
+ * so the INDEX_NULL_MASK bit suffices. Also, pass-by-value datums are
* stored as a full Datum, the same convention as for inner tuple prefixes
* and leaf tuple datums.
*/
@@ -322,11 +226,13 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
PointerGetDatum(SGNTDATAPTR(x)))
/*
- * SPGiST leaf tuple: carries a datum and a heap tuple TID
+ * SPGiST leaf tuple: carries a key datum, a heap tuple TID and optional
+ * datums and nullmask of INCLUDE columns.
*
- * In the simplest case, the datum is the same as the indexed value; but
+ * In the simplest case, the key datum is the same as the indexed value; but
* it could also be a suffix or some other sort of delta that permits
* reconstruction given knowledge of the prefix path traversed to get here.
+ * Datums of INCLUDE columns are stored without modification.
*
* The size field is wider than could possibly be needed for an on-disk leaf
* tuple, but this allows us to form leaf tuples even when the datum is too
@@ -346,14 +252,43 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
* however, the SGDTSIZE limit ensures that's there's a Datum word there
* anyway, so SGLTDATUM can be applied safely as long as you don't do
* anything with the result.
+ *
+ * Minimum space to store SpGistLeafTuple plus ItemIdData on a page is 16 bytes,
+ * so 14 lower bits of nextOffset is enough to store tuple number in a chain
+ * on a page even if a page size is 64Kb. Two higher bits are to store per-tuple
+ * information for INCLUDE attributes: is there nulls mask exist, and are there
+ * any INCLUDE attributes of variable length type. If there are no INCLUDE
+ * columns these higher bits are not used.
+ *
+ * If there are INCLUDE columns, they are stored after a key value, each
+ * starting from its own typalign boundary. Unlike IndexTuple, first INCLUDE
+ * value does not need to start from MAXALIGN boundary, so SPGiST uses private
+ * routines to access them. Nullmask with size (number of INCLUDE columns)/8
+ * bytes is put without alignment between the key and the first INCLUDE column.
+ * If there is an alignment gap between them, nullmask has a good chance to fit
+ * into the gap, thus making its storage free of charge.
*/
+
typedef struct SpGistLeafTupleData
{
unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
size:30; /* large enough for any palloc'able value */
- OffsetNumber nextOffset; /* next tuple in chain, or InvalidOffsetNumber */
+
+ /* ---------------
+ * nextOffset is laid out in the following fashion:
+ *
+ * 15th (high) bit: INCLUDE values has nulls
+ * 14th bit: INCLUDE values has var-length attributes
+ * 13-0 bit: number of next tuple in chain on a page, or InvalidOffsetNumber
+ * ---------------
+ */
+
+ unsigned short nextOffset; /* info for linking tuples in a chain on a leaf page,
+ and additional info for INCLUDE attributes */
ItemPointerData heapPtr; /* TID of represented heap tuple */
- /* leaf datum follows */
+ /* key column data follows */
+ /* nullmask of INCLUDE values follows if there are nulls in INCLUDE attributes*/
+ /* INCLUDE columns data follow if any */
} SpGistLeafTupleData;
typedef SpGistLeafTupleData *SpGistLeafTuple;
@@ -361,8 +296,25 @@ typedef SpGistLeafTupleData *SpGistLeafTuple;
#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
- *(Datum *) SGLTDATAPTR(x) : \
- PointerGetDatum(SGLTDATAPTR(x)))
+ *(Datum *) SGLTDATAPTR(x) : \
+ PointerGetDatum(SGLTDATAPTR(x)))
+/*
+ * Macros to access bit fields inside nextOffset independently.
+ */
+#define SGLT_GET_OFFSET(spgLeafTuple) ( (spgLeafTuple)->nextOffset & 0x3FFF )
+#define SGLT_GET_CONTAINSNULLMASK(spgLeafTuple) \
+ ( (bool)((spgLeafTuple)->nextOffset >> 15) )
+#define SGLT_GET_CONTAINSVARATT(spgLeafTuple) \
+ ( (bool)(((spgLeafTuple)->nextOffset & 0x4000) >> 14) )
+#define SGLT_SET_OFFSET(spgLeafTuple, offsetNumber) \
+ ( (spgLeafTuple)->nextOffset = \
+ ((spgLeafTuple)->nextOffset & 0xC000) | ((offsetNumber) & 0x3FFF) )
+#define SGLT_SET_CONTAINSNULLMASK(spgLeafTuple, is_null) \
+ ( (spgLeafTuple)->nextOffset = \
+ ((uint16)(bool)(is_null) << 15) | ((spgLeafTuple)->nextOffset & 0x3FFF) )
+#define SGLT_SET_CONTAINSVARATT(spgLeafTuple, is_varatt) \
+ ( (spgLeafTuple)->nextOffset = \
+ ((uint16)(bool)(is_varatt) << 14) | ((spgLeafTuple)->nextOffset & 0xBFFF) )
/*
* SPGiST dead tuple: declaration for examining non-live tuples
@@ -394,7 +346,6 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
* size plus sizeof(ItemIdData) (for the line pointer). This works correctly
* so long as tuple sizes are always maxaligned.
*/
-
/* Page capacity after allowing for fixed header and special space */
#define SPGIST_PAGE_CAPACITY \
MAXALIGN_DOWN(BLCKSZ - \
@@ -410,6 +361,105 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
Min(SpGistPageGetOpaque(p)->nPlaceholder, n) * \
(SGDTSIZE + sizeof(ItemIdData)))
+
+typedef struct SpGistSearchItem
+{
+ pairingheap_node phNode; /* pairing heap node */
+ Datum value; /* value reconstructed from parent or
+ * leafValue if heaptuple */
+ void *traversalValue; /* opclass-specific traverse value */
+ int level; /* level of items on this page */
+ ItemPointerData heapPtr; /* heap info, if heap tuple */
+ bool isNull; /* SearchItem is NULL item */
+ bool isLeaf; /* SearchItem is heap item */
+ bool recheck; /* qual recheck is needed */
+ bool recheckDistances; /* distance recheck is needed */
+ SpGistLeafTuple leafTuple;
+ /* array with numberOfOrderBys entries */
+ double distances[FLEXIBLE_ARRAY_MEMBER];
+} SpGistSearchItem;
+
+#define SizeOfSpGistSearchItem(n_distances) \
+ (offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
+
+/*
+ * Private state of an index scan
+ */
+typedef struct SpGistScanOpaqueData
+{
+ SpGistState state; /* see above */
+ pairingheap *scanQueue; /* queue of to be visited items */
+ MemoryContext tempCxt; /* short-lived memory context */
+ MemoryContext traversalCxt; /* single scan lifetime memory context */
+
+ /* Control flags showing whether to search nulls and/or non-nulls */
+ bool searchNulls; /* scan matches (all) null entries */
+ bool searchNonNulls; /* scan matches (some) non-null entries */
+
+ /* Index quals to be passed to opclass (null-related quals removed) */
+ int numberOfKeys; /* number of index qualifier conditions */
+ ScanKey keyData; /* array of index qualifier descriptors */
+ int numberOfOrderBys; /* number of ordering operators */
+ int numberOfNonNullOrderBys; /* number of ordering operators
+ * with non-NULL arguments */
+ ScanKey orderByData; /* array of ordering op descriptors */
+ Oid *orderByTypes; /* array of ordering op return types */
+ int *nonNullOrderByOffsets; /* array of offset of non-NULL
+ * ordering keys in the original array */
+ Oid indexCollation; /* collation of index column */
+
+ /* Opclass defined functions: */
+ FmgrInfo innerConsistentFn;
+ FmgrInfo leafConsistentFn;
+
+ /* Pre-allocated workspace arrays: */
+ double *zeroDistances;
+ double *infDistances;
+
+ /* These fields are only used in amgetbitmap scans: */
+ TIDBitmap *tbm; /* bitmap being filled */
+ int64 ntids; /* number of TIDs passed to bitmap */
+
+ /* These fields are only used in amgettuple scans: */
+ bool want_itup; /* are we reconstructing tuples? */
+ TupleDesc indexTupDesc; /* if so, tuple descriptor for them */
+ int nPtrs; /* number of TIDs found on current page */
+ int iPtr; /* index for scanning through same */
+ ItemPointerData heapPtrs[MaxIndexTuplesPerPage]; /* TIDs from cur page */
+ bool recheck[MaxIndexTuplesPerPage]; /* their recheck flags */
+ bool recheckDistances[MaxIndexTuplesPerPage]; /* distance recheck
+ * flags */
+ HeapTuple reconTups[MaxIndexTuplesPerPage]; /* reconstructed tuples */
+
+ /* distances (for recheck) */
+ IndexOrderByDistance *distances[MaxIndexTuplesPerPage];
+
+ /*
+ * Note: using MaxIndexTuplesPerPage above is a bit hokey since
+ * SpGistLeafTuples aren't exactly IndexTuples; however, they are larger,
+ * so this is safe.
+ */
+} SpGistScanOpaqueData;
+
+typedef SpGistScanOpaqueData *SpGistScanOpaque;
+
+/*
+ * This struct is what we actually keep in index->rd_amcache. It includes
+ * static configuration information as well as the lastUsedPages cache.
+ */
+typedef struct SpGistCache
+{
+ spgConfigOut config; /* filled in by opclass config method */
+
+ SpGistTypeDesc attType; /* type of values to be indexed/restored */
+ SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
+ SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
+ SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc;
+
+ SpGistLUPCache lastUsedPages; /* local storage of last-used info */
+} SpGistCache;
+
/*
* XLOG stuff
*/
@@ -456,9 +506,10 @@ extern void SpGistInitPage(Page page, uint16 f);
extern void SpGistInitBuffer(Buffer b, uint16 f);
extern void SpGistInitMetapage(Page page);
extern unsigned int SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum);
+extern unsigned int spgLeafTupleSize(SpGistState *state, Datum *datum, bool *isnull);
extern SpGistLeafTuple spgFormLeafTuple(SpGistState *state,
ItemPointer heapPtr,
- Datum datum, bool isnull);
+ Datum *datum, bool *isnull);
extern SpGistNodeTuple spgFormNodeTuple(SpGistState *state,
Datum label, bool isnull);
extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
@@ -466,6 +517,8 @@ extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
int nNodes, SpGistNodeTuple *nodes);
extern SpGistDeadTuple spgFormDeadTuple(SpGistState *state, int tupstate,
BlockNumber blkno, OffsetNumber offnum);
+extern void spgDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state,
+ Datum *datum, bool *isnull, bool key_value_isnull);
extern Datum *spgExtractNodeLabels(SpGistState *state,
SpGistInnerTuple innerTuple);
extern OffsetNumber SpGistPageAddNewItem(SpGistState *state, Page page,
@@ -484,7 +537,7 @@ extern void spgPageIndexMultiDelete(SpGistState *state, Page page,
int firststate, int reststate,
BlockNumber blkno, OffsetNumber offnum);
extern bool spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull);
+ ItemPointer heapPtr, Datum *datum, bool *isnull);
/* spgproc.c */
extern double *spg_key_orderbys_distances(Datum key, bool isLeaf,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index d92a6d12c6..93e6a43b6d 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -169,9 +169,9 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
hash | bogus |
spgist | can_order | f
spgist | can_unique | f
- spgist | can_multi_col | f
+ spgist | can_multi_col | t
spgist | can_exclude | t
- spgist | can_include | f
+ spgist | can_include | t
spgist | bogus |
(36 rows)
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..86510687c7 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -349,14 +349,13 @@ SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl' ORDER BY indexname;
DROP TABLE tbl;
/*
- * 7. Check various AMs. All but btree and gist must fail.
+ * 7. Check various AMs. All but btree, gist and spgist must fail.
*/
CREATE TABLE tbl (c1 int,c2 int, c3 box, c4 box);
CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "brin" does not support included columns
CREATE INDEX on tbl USING gist(c3) INCLUDE (c1, c4);
CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
-ERROR: access method "spgist" does not support included columns
CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "gin" does not support included columns
CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
diff --git a/src/test/regress/expected/index_including_spgist.out b/src/test/regress/expected/index_including_spgist.out
new file mode 100644
index 0000000000..213cce5c7c
--- /dev/null
+++ b/src/test/regress/expected/index_including_spgist.out
@@ -0,0 +1,143 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+SET enable_seqscan TO off;
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+VACUUM ANALYZE tbl_spgist;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+VACUUM ANALYZE tbl_spgist;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+DROP TABLE tbl_spgist;
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+----------
+(0 rows)
+
+DROP TABLE tbl_spgist;
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+ Table "public.tbl_spgist"
+ Column | Type | Collation | Nullable | Default
+--------+---------+-----------+----------+---------
+ c1 | bigint | | |
+ c2 | integer | | |
+ c3 | bigint | | |
+ c4 | box | | |
+Indexes:
+ "tbl_spgist_idx" spgist (c4) INCLUDE (c1, c3)
+
+RESET enable_seqscan;
+DROP TABLE tbl_spgist;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..985458a1a8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -50,7 +50,7 @@ test: copy copyselect copydml insert insert_conflict
# ----------
test: create_misc create_operator create_procedure
# These depend on create_misc and create_operator
-test: create_index create_index_spgist create_view index_including index_including_gist
+test: create_index create_index_spgist create_view index_including index_including_gist index_including_spgist
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..f3df961535 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -68,6 +68,7 @@ test: create_index_spgist
test: create_view
test: index_including
test: index_including_gist
+test: index_including_spgist
test: create_aggregate
test: create_function_3
test: create_cast
diff --git a/src/test/regress/sql/index_including.sql b/src/test/regress/sql/index_including.sql
index 7e517483ad..44b340053b 100644
--- a/src/test/regress/sql/index_including.sql
+++ b/src/test/regress/sql/index_including.sql
@@ -182,7 +182,7 @@ SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl' ORDER BY indexname;
DROP TABLE tbl;
/*
- * 7. Check various AMs. All but btree and gist must fail.
+ * 7. Check various AMs. All but btree, gist and spgist must fail.
*/
CREATE TABLE tbl (c1 int,c2 int, c3 box, c4 box);
CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
diff --git a/src/test/regress/sql/index_including_spgist.sql b/src/test/regress/sql/index_including_spgist.sql
new file mode 100644
index 0000000000..38ace74d4e
--- /dev/null
+++ b/src/test/regress/sql/index_including_spgist.sql
@@ -0,0 +1,84 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+SET enable_seqscan TO off;
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+VACUUM ANALYZE tbl_spgist;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+VACUUM ANALYZE tbl_spgist;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+RESET enable_seqscan;
+DROP TABLE tbl_spgist;
--
2.28.0
31 авг. 2020 г., в 16:57, Pavel Borisov <pashkin.elfe@gmail.com> написал(а):
I agree with all of your proposals and integrated them into v9.
I have a wild idea of renaming nextOffset in SpGistLeafTupleData.
Or at least documenting in comments that this field is more than just an offset.
This looks like assert rather than real runtime check in spgLeafTupleSize()
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ state->includeTupdesc->natts, INDEX_MAX_KEYS)));
Even if you go with check, number of columns is state->includeTupdesc->natts + 1.
Also I'd refactor this
+ /* Form descriptor for INCLUDE columns if any */
+ if (IndexRelationGetNumberOfAttributes(index) > 1)
+ {
+ int i;
+
+ cache->includeTupdesc = CreateTemplateTupleDesc(
+ IndexRelationGetNumberOfAttributes(index) - 1);
+ for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; i++)
+ {
+ TupleDescInitEntry(cache->includeTupdesc, i + 1, NULL,
+ TupleDescAttr(index->rd_att, i + 1)->atttypid,
+ -1, 0);
+ }
+ }
+ else
+ cache->includeTupdesc = NULL;
into something like
cache->includeTupdesc = NULL;
for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; i++)
{
if (cache->includeTupdesc == NULL)
// init tuple description
// init entry
}
But, probably it's only a matter of taste.
Beside this, I think patch is ready for committer. If Anastasia has no objections, let's flip CF entry state.
Thanks!
Best regards, Andrey Borodin.
I have a wild idea of renaming nextOffset in SpGistLeafTupleData.
Or at least documenting in comments that this field is more than just an
offset.
Seems reasonable as previous changes localized mentions of nextOffset only
to leaf tuple definition and access macros. So I've done this renaming.
This looks like assert rather than real runtime check in spgLeafTupleSize()
+ if (state->includeTupdesc->natts + 1 >= INDEX_MAX_KEYS) + ereport(ERROR, + (errcode(ERRCODE_TOO_MANY_COLUMNS), + errmsg("number of index columns (%d) exceeds limit (%d)", + state->includeTupdesc->natts, INDEX_MAX_KEYS))); Even if you go with check, number of columns is state->includeTupdesc->natts + 1.
I placed this check only once on SpGist state creation and replaced the
other checks to asserts.
Also I'd refactor this
+ /* Form descriptor for INCLUDE columns if any */
Also done. Thanks a lot!
See v10.
--
Best regards,
Pavel Borisov
Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>
Attachments:
v10-0001-Covering-SP-GiST-index-support-for-INCLUDE-colum.patchapplication/octet-stream; name=v10-0001-Covering-SP-GiST-index-support-for-INCLUDE-colum.patchDownload
From ea7f789574612d27007980022917cc7532899c53 Mon Sep 17 00:00:00 2001
From: Pavel Borisov <pashkin.elfe@gmail.com>
Date: Wed, 2 Sep 2020 19:38:49 +0400
Subject: [PATCH v10] Covering SP-GiST index - support for INCLUDE columns
Adding INCLUDE columns for SPGiST index is intended to increase the speed of queries by making scans index-only likewise
in btree and GiST index. These columns are added only to leaf tuples and they are not used in index tree search but they
can be fetched during index scan.
The other point of INCLUDE columns is to overcome SP-GiST limitation of being single-column in principle. I.e. in certain
cases a single covering SP-GiST index can replace several separate ones with less disk space and shared buffers
consumption, faster, update etc. Also, any data types without SP-GiST supported opclasses can be included.
Discussion: https://www.postgresql.org/message-id/flat/CALT9ZEFi-vMp4faht9f9Junb1nO3NOSjhpxTmbm1UGLMsLqiEQ@mail.gmail.com
---
doc/src/sgml/indices.sgml | 4 +-
doc/src/sgml/ref/create_index.sgml | 4 +-
doc/src/sgml/spgist.sgml | 8 +
src/backend/access/spgist/README | 21 +-
src/backend/access/spgist/spgdoinsert.c | 175 +++++---
src/backend/access/spgist/spginsert.c | 5 +-
src/backend/access/spgist/spgscan.c | 87 +++-
src/backend/access/spgist/spgutils.c | 381 ++++++++++++++++--
src/backend/access/spgist/spgvacuum.c | 25 +-
src/backend/access/spgist/spgxlog.c | 6 +-
src/include/access/spgist_private.h | 286 +++++++------
src/test/regress/expected/amutils.out | 4 +-
src/test/regress/expected/index_including.out | 3 +-
.../expected/index_including_spgist.out | 143 +++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/index_including.sql | 2 +-
.../regress/sql/index_including_spgist.sql | 84 ++++
18 files changed, 995 insertions(+), 246 deletions(-)
create mode 100644 src/test/regress/expected/index_including_spgist.out
create mode 100644 src/test/regress/sql/index_including_spgist.sql
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 28adaba72d..c89cc6cb08 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1194,8 +1194,8 @@ CREATE UNIQUE INDEX tab_x_y ON tab(x) INCLUDE (y);
likely to not need to access the heap. If the heap tuple must be visited
anyway, it costs nothing more to get the column's value from there.
Other restrictions are that expressions are not currently supported as
- included columns, and that only B-tree and GiST indexes currently support
- included columns.
+ included columns, and that only B-tree, GiST and SP-GiST indexes currently
+ support included columns.
</para>
<para>
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index ff87b2d28f..3d360bcf47 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -187,8 +187,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
</para>
<para>
- Currently, the B-tree and the GiST index access methods support this
- feature. In B-tree and the GiST indexes, the values of columns listed
+ Currently, the B-tree, GiST and SP-GiST index access methods support
+ this feature. In these indexes, the values of columns listed
in the <literal>INCLUDE</literal> clause are included in leaf tuples
which correspond to heap tuples, but are not included in upper-level
index entries used for tree navigation.
diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index 0e04a08679..868a140a6a 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -240,6 +240,14 @@
inner tuples that are passed through to reach the leaf level.
</para>
+ <para>
+ In case when <acronym>SP-GiST</acronym> index is created with
+ <literal>INCLUDE</literal> clause i.e. covering index, leaf tuples also
+ contain data from included columns. This data is stored uncompressed and can have
+ data types without any SP-GiST operator class.
+
+ </para>
+
<para>
Inner tuples are more complex, since they are branching points in the
search tree. Each inner tuple contains a set of one or more
diff --git a/src/backend/access/spgist/README b/src/backend/access/spgist/README
index b55b073832..55b515f03d 100644
--- a/src/backend/access/spgist/README
+++ b/src/backend/access/spgist/README
@@ -73,9 +73,22 @@ Leaf tuple consists of:
Example:
radix tree - the rest of string (postfix)
quad and k-d tree - the point itself
-
ItemPointer to the heap
-
+ nextOffset number of next leaf tuple in a chain on a leaf page
+ optional nullmask for INCLUDE columns
+ optional INCLUDE columns values
+
+Leaf tuple layout changed since PostgreSQL version 14 to support INCLUDE
+columns but in a way that doesn't change the header and the key value
+placement in a tuple. So indexes created earlier remain fully supported.
+
+Also it is intended to be laid out with minimum possible gaps to make index
+smaller. I.e. first header of 12 bytes, then a key value starting from
+maxalign boundary, then just immediately nulls mask bytes, then INCLUDE
+attributes each starting from its typealign boundary. So in many cases,
+nullmask is stored free of charge and tuple occupy minimum possible space
+(with exception of gap before key value which starts from maxalign for
+compatibility).
NULLS HANDLING
@@ -90,6 +103,10 @@ Insertions and searches in the nulls tree do not use any of the
opclass-supplied functions, but just use hardwired logic comparable to
AllTheSame cases in the normal tree.
+For INCLUDE attributes nulls are handled in ordinary per leaf-tuple way i.e.
+if null mask presence bit in a header is set, nullmask is added just after
+key value before the first INCLUDE attribute. Note that nullmask presence
+bit and nullmask itself apply only to INCLUDE attributes.
INSERTION ALGORITHM
diff --git a/src/backend/access/spgist/spgdoinsert.c b/src/backend/access/spgist/spgdoinsert.c
index 934d65b89f..335bbdb9dc 100644
--- a/src/backend/access/spgist/spgdoinsert.c
+++ b/src/backend/access/spgist/spgdoinsert.c
@@ -22,7 +22,7 @@
#include "miscadmin.h"
#include "storage/bufmgr.h"
#include "utils/rel.h"
-
+#include "access/htup_details.h"
/*
* SPPageDesc tracks all info about a page we are inserting into. In some
@@ -220,7 +220,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
SpGistBlockIsRoot(current->blkno))
{
/* Tuple is not part of a chain */
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple, InvalidOffsetNumber);
current->offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -253,7 +253,7 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
PageGetItemId(current->page, current->offnum));
if (head->tupstate == SPGIST_LIVE)
{
- leafTuple->nextOffset = head->nextOffset;
+ SGLT_SET_OFFSET(leafTuple, SGLT_GET_OFFSET(head));
offnum = SpGistPageAddNewItem(state, current->page,
(Item) leafTuple, leafTuple->size,
NULL, false);
@@ -264,14 +264,14 @@ addLeafTuple(Relation index, SpGistState *state, SpGistLeafTuple leafTuple,
*/
head = (SpGistLeafTuple) PageGetItem(current->page,
PageGetItemId(current->page, current->offnum));
- head->nextOffset = offnum;
+ SGLT_SET_OFFSET(head, offnum);
xlrec.offnumLeaf = offnum;
xlrec.offnumHeadLeaf = current->offnum;
}
else if (head->tupstate == SPGIST_DEAD)
{
- leafTuple->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(leafTuple, InvalidOffsetNumber);
PageIndexTupleDelete(current->page, current->offnum);
if (PageAddItem(current->page,
(Item) leafTuple, leafTuple->size,
@@ -362,13 +362,13 @@ checkSplitConditions(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it) == InvalidOffsetNumber);
/* Don't count it in result, because it won't go to other page */
}
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it);
}
*nToSplit = n;
@@ -437,7 +437,7 @@ moveLeafs(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it) == InvalidOffsetNumber);
/* We don't want to move it, so don't count it in size */
toDelete[nDelete] = i;
nDelete++;
@@ -446,7 +446,7 @@ moveLeafs(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it);
}
/* Find a leaf page that will hold them */
@@ -475,7 +475,7 @@ moveLeafs(Relation index, SpGistState *state,
* don't care). We're modifying the tuple on the source page
* here, but it's okay since we're about to delete it.
*/
- it->nextOffset = r;
+ SGLT_SET_OFFSET(it, r);
r = SpGistPageAddNewItem(state, npage, (Item) it, it->size,
&startOffset, false);
@@ -490,7 +490,7 @@ moveLeafs(Relation index, SpGistState *state,
}
/* add the new tuple as well */
- newLeafTuple->nextOffset = r;
+ SGLT_SET_OFFSET(newLeafTuple, r);
r = SpGistPageAddNewItem(state, npage,
(Item) newLeafTuple, newLeafTuple->size,
&startOffset, false);
@@ -709,6 +709,11 @@ doPickSplit(Relation index, SpGistState *state,
int nToDelete,
nToInsert,
maxToInclude;
+ Datum *leafChainDatums;
+ bool *leafChainIsnulls;
+ const int natts = IndexRelationGetNumberOfAttributes(index);
+ int chainStoreIndex; /* Index for start of datums/isnulls for a
+ current chain item */
in.level = level;
@@ -723,14 +728,16 @@ doPickSplit(Relation index, SpGistState *state,
toInsert = (OffsetNumber *) palloc(sizeof(OffsetNumber) * n);
newLeafs = (SpGistLeafTuple *) palloc(sizeof(SpGistLeafTuple) * n);
leafPageSelect = (uint8 *) palloc(sizeof(uint8) * n);
-
STORE_STATE(state, xlrec.stateSrc);
+ leafChainDatums = (Datum *) palloc(n * natts * sizeof(Datum));
+ leafChainIsnulls = (bool *) palloc(n * natts * sizeof(bool));
+
/*
- * Form list of leaf tuples which will be distributed as split result;
- * also, count up the amount of space that will be freed from current.
- * (Note that in the non-root case, we won't actually delete the old
- * tuples, only replace them with redirects or placeholders.)
+ * Collect leaf tuples which will be distributed as split result; also,
+ * count up the amount of space that will be freed from current. (Note
+ * that in the non-root case, we won't actually delete the old tuples,
+ * only replace them with redirects or placeholders.)
*
* Note: the SGLTDATUM calls here are safe even when dealing with a nulls
* page. For a pass-by-value data type we will fetch a word that must
@@ -738,7 +745,15 @@ doPickSplit(Relation index, SpGistState *state,
* tuples must have size at least SGDTSIZE). For a pass-by-reference type
* we are just computing a pointer that isn't going to get dereferenced.
* So it's not worth guarding the calls with isNulls checks.
+ *
+ * Datums and isnulls of all leaf tuple attributes in the chain are
+ * collected into 2-d arrays: (number of tuples in the chain) x (number of
+ * attributes) The first attribute is key, the other - INCLUDE attributes (if
+ * any). After picksplit we need to form new leaf tuples as the key attribute
+ * length can change which can affect the alignment of every INCLUDE
+ * attribute.
*/
+
nToInsert = 0;
nToDelete = 0;
spaceToDelete = 0;
@@ -759,6 +774,9 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+ chainStoreIndex = nToInsert * natts;
+ spgDeformLeafTuple(it, state, &leafChainDatums[chainStoreIndex],
+ &leafChainIsnulls[chainStoreIndex], isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -784,6 +802,9 @@ doPickSplit(Relation index, SpGistState *state,
{
in.datums[nToInsert] = SGLTDATUM(it, state);
heapPtrs[nToInsert] = it->heapPtr;
+ chainStoreIndex = nToInsert * natts;
+ spgDeformLeafTuple(it, state, &leafChainDatums[chainStoreIndex],
+ &leafChainIsnulls[chainStoreIndex], isNulls);
nToInsert++;
toDelete[nToDelete] = i;
nToDelete++;
@@ -795,7 +816,7 @@ doPickSplit(Relation index, SpGistState *state,
{
/* We could see a DEAD tuple as first/only chain item */
Assert(i == current->offnum);
- Assert(it->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(it) == InvalidOffsetNumber);
toDelete[nToDelete] = i;
nToDelete++;
/* replacing it with redirect will save no space */
@@ -803,7 +824,7 @@ doPickSplit(Relation index, SpGistState *state,
else
elog(ERROR, "unexpected SPGiST tuple state: %d", it->tupstate);
- i = it->nextOffset;
+ i = SGLT_GET_OFFSET(it);
}
}
in.nTuples = nToInsert;
@@ -816,10 +837,17 @@ doPickSplit(Relation index, SpGistState *state,
*/
in.datums[in.nTuples] = SGLTDATUM(newLeafTuple, state);
heapPtrs[in.nTuples] = newLeafTuple->heapPtr;
+ chainStoreIndex = in.nTuples * natts;
+ spgDeformLeafTuple(newLeafTuple, state, &leafChainDatums[chainStoreIndex],
+ &leafChainIsnulls[chainStoreIndex], isNulls);
in.nTuples++;
memset(&out, 0, sizeof(out));
+ /*
+ * Process collected key values of tuples from the chain. Included values
+ * are used to build fresh leaf tuples unchanged.
+ */
if (!isNulls)
{
/*
@@ -837,9 +865,13 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
+ chainStoreIndex = i * natts;
+ leafChainDatums[chainStoreIndex] = (Datum) out.leafTupleDatums[i];
+ leafChainIsnulls[chainStoreIndex] = false;
+
newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- out.leafTupleDatums[i],
- false);
+ &leafChainDatums[chainStoreIndex],
+ &leafChainIsnulls[chainStoreIndex]);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -860,9 +892,16 @@ doPickSplit(Relation index, SpGistState *state,
totalLeafSizes = 0;
for (i = 0; i < in.nTuples; i++)
{
+ /*
+ * Nulls tree can contain only null key values.
+ */
+ chainStoreIndex = i * natts;
+ leafChainDatums[chainStoreIndex] = (Datum) 0;
+ leafChainIsnulls[chainStoreIndex] = true;
+
newLeafs[i] = spgFormLeafTuple(state, heapPtrs + i,
- (Datum) 0,
- true);
+ &leafChainDatums[chainStoreIndex],
+ &leafChainIsnulls[chainStoreIndex]);
totalLeafSizes += newLeafs[i]->size + sizeof(ItemIdData);
}
}
@@ -1196,10 +1235,10 @@ doPickSplit(Relation index, SpGistState *state,
if (ItemPointerIsValid(&nodes[n]->t_tid))
{
Assert(ItemPointerGetBlockNumber(&nodes[n]->t_tid) == leafBlock);
- it->nextOffset = ItemPointerGetOffsetNumber(&nodes[n]->t_tid);
+ SGLT_SET_OFFSET(it, ItemPointerGetOffsetNumber(&nodes[n]->t_tid));
}
else
- it->nextOffset = InvalidOffsetNumber;
+ SGLT_SET_OFFSET(it, InvalidOffsetNumber);
/* Insert it on page */
newoffset = SpGistPageAddNewItem(state, BufferGetPage(leafBuffer),
@@ -1889,67 +1928,83 @@ spgSplitNodeAction(Relation index, SpGistState *state,
*/
bool
spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull)
+ ItemPointer heapPtr, Datum *datum, bool *isnull)
{
int level = 0;
- Datum leafDatum;
+ Datum *leafDatum;
int leafSize;
SPPageDesc current,
parent;
FmgrInfo *procinfo = NULL;
+ int i;
/*
* Look up FmgrInfo of the user-defined choose function once, to save
* cycles in the loop below.
*/
- if (!isnull)
+ if (!isnull[spgKeyColumn])
procinfo = index_getprocinfo(index, 1, SPGIST_CHOOSE_PROC);
/*
* Prepare the leaf datum to insert.
- *
+ */
+
+ leafDatum = (Datum *) palloc0(sizeof(Datum) * (IndexRelationGetNumberOfAttributes(index)));
+
+ /*
* If an optional "compress" method is provided, then call it to form the
- * leaf datum from the input datum. Otherwise store the input datum as
- * is. Since we don't use index_form_tuple in this AM, we have to make
- * sure value to be inserted is not toasted; FormIndexDatum doesn't
- * guarantee that. But we assume the "compress" method to return an
- * untoasted value.
+ * key datum from the input datum. Otherwise, store the input datum as is.
+ * Since we don't use index_form_tuple in this AM, we have to make sure
+ * value to be inserted is not toasted; FormIndexDatum doesn't guarantee
+ * that. But we assume the "compress" method to return an untoasted
+ * value.
*/
- if (!isnull)
+ if (!isnull[spgKeyColumn])
{
if (OidIsValid(index_getprocid(index, 1, SPGIST_COMPRESS_PROC)))
{
FmgrInfo *compressProcinfo = NULL;
compressProcinfo = index_getprocinfo(index, 1, SPGIST_COMPRESS_PROC);
- leafDatum = FunctionCall1Coll(compressProcinfo,
- index->rd_indcollation[0],
- datum);
+ leafDatum[spgKeyColumn] = FunctionCall1Coll(compressProcinfo,
+ index->rd_indcollation[0],
+ datum[spgKeyColumn]);
}
else
{
Assert(state->attLeafType.type == state->attType.type);
if (state->attType.attlen == -1)
- leafDatum = PointerGetDatum(PG_DETOAST_DATUM(datum));
+ leafDatum[spgKeyColumn] = PointerGetDatum(PG_DETOAST_DATUM(datum[spgKeyColumn]));
else
- leafDatum = datum;
+ leafDatum[spgKeyColumn] = datum[spgKeyColumn];
}
}
else
- leafDatum = (Datum) 0;
+ leafDatum[spgKeyColumn] = (Datum) 0;
+
+ for (i = 1; i < IndexRelationGetNumberOfAttributes(index); i++)
+ {
+ if (!isnull[i])
+ {
+ if (TupleDescAttr(state->includeTupdesc, i - 1)->attlen == -1)
+ leafDatum[i] = PointerGetDatum(PG_DETOAST_DATUM(datum[i]));
+ else
+ leafDatum[i] = datum[i];
+ }
+ else
+ leafDatum[i] = (Datum) 0;
+ }
+
/*
- * Compute space needed for a leaf tuple containing the given datum.
+ * Compute space needed on a page for a leaf tuple containing the given
+ * datum.
*
* If it isn't gonna fit, and the opclass can't reduce the datum size by
* suffixing, bail out now rather than getting into an endless loop.
*/
- if (!isnull)
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
- else
- leafSize = SGDTSIZE + sizeof(ItemIdData);
+ leafSize = spgLeafTupleSize(state, leafDatum, isnull) + sizeof(ItemIdData);
if (leafSize > SPGIST_PAGE_CAPACITY && !state->config.longValuesOK)
ereport(ERROR,
@@ -1961,7 +2016,7 @@ spgdoinsert(Relation index, SpGistState *state,
errhint("Values larger than a buffer page cannot be indexed.")));
/* Initialize "current" to the appropriate root page */
- current.blkno = isnull ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
+ current.blkno = isnull[spgKeyColumn] ? SPGIST_NULL_BLKNO : SPGIST_ROOT_BLKNO;
current.buffer = InvalidBuffer;
current.page = NULL;
current.offnum = FirstOffsetNumber;
@@ -1995,7 +2050,7 @@ spgdoinsert(Relation index, SpGistState *state,
*/
current.buffer =
SpGistGetBuffer(index,
- GBUF_LEAF | (isnull ? GBUF_NULLS : 0),
+ GBUF_LEAF | (isnull[spgKeyColumn] ? GBUF_NULLS : 0),
Min(leafSize, SPGIST_PAGE_CAPACITY),
&isNew);
current.blkno = BufferGetBlockNumber(current.buffer);
@@ -2037,7 +2092,7 @@ spgdoinsert(Relation index, SpGistState *state,
current.page = BufferGetPage(current.buffer);
/* should not arrive at a page of the wrong type */
- if (isnull ? !SpGistPageStoresNulls(current.page) :
+ if (isnull[spgKeyColumn] ? !SpGistPageStoresNulls(current.page) :
SpGistPageStoresNulls(current.page))
elog(ERROR, "SPGiST index page %u has wrong nulls flag",
current.blkno);
@@ -2054,7 +2109,7 @@ spgdoinsert(Relation index, SpGistState *state,
{
/* it fits on page, so insert it and we're done */
addLeafTuple(index, state, leafTuple,
- ¤t, &parent, isnull, isNew);
+ ¤t, &parent, isnull[spgKeyColumn], isNew);
break;
}
else if ((sizeToSplit =
@@ -2068,14 +2123,14 @@ spgdoinsert(Relation index, SpGistState *state,
* chain to another leaf page rather than splitting it.
*/
Assert(!isNew);
- moveLeafs(index, state, ¤t, &parent, leafTuple, isnull);
+ moveLeafs(index, state, ¤t, &parent, leafTuple, isnull[spgKeyColumn]);
break; /* we're done */
}
else
{
/* picksplit */
if (doPickSplit(index, state, ¤t, &parent,
- leafTuple, level, isnull, isNew))
+ leafTuple, level, isnull[spgKeyColumn], isNew))
break; /* doPickSplit installed new tuples */
/* leaf tuple will not be inserted yet */
@@ -2110,8 +2165,8 @@ spgdoinsert(Relation index, SpGistState *state,
innerTuple = (SpGistInnerTuple) PageGetItem(current.page,
PageGetItemId(current.page, current.offnum));
- in.datum = datum;
- in.leafDatum = leafDatum;
+ in.datum = datum[spgKeyColumn];
+ in.leafDatum = leafDatum[spgKeyColumn];
in.level = level;
in.allTheSame = innerTuple->allTheSame;
in.hasPrefix = (innerTuple->prefixSize > 0);
@@ -2121,7 +2176,7 @@ spgdoinsert(Relation index, SpGistState *state,
memset(&out, 0, sizeof(out));
- if (!isnull)
+ if (!isnull[spgKeyColumn])
{
/* use user-defined choose method */
FunctionCall2Coll(procinfo,
@@ -2158,11 +2213,11 @@ spgdoinsert(Relation index, SpGistState *state,
/* Adjust level as per opclass request */
level += out.result.matchNode.levelAdd;
/* Replace leafDatum and recompute leafSize */
- if (!isnull)
+ if (!isnull[spgKeyColumn])
{
- leafDatum = out.result.matchNode.restDatum;
- leafSize = SGLTHDRSZ + sizeof(ItemIdData) +
- SpGistGetTypeSize(&state->attLeafType, leafDatum);
+ leafDatum[spgKeyColumn] = out.result.matchNode.restDatum;
+ leafSize = spgLeafTupleSize(state, leafDatum, isnull) +
+ sizeof(ItemIdData);
}
/*
@@ -2227,6 +2282,6 @@ spgdoinsert(Relation index, SpGistState *state,
SpGistSetLastUsedPage(index, parent.buffer);
UnlockReleaseBuffer(parent.buffer);
}
-
+ pfree(leafDatum);
return true;
}
diff --git a/src/backend/access/spgist/spginsert.c b/src/backend/access/spgist/spginsert.c
index e4508a2b92..b54ae85f6e 100644
--- a/src/backend/access/spgist/spginsert.c
+++ b/src/backend/access/spgist/spginsert.c
@@ -55,8 +55,7 @@ spgistBuildCallback(Relation index, ItemPointer tid, Datum *values,
* lock on some buffer. So we need to be willing to retry. We can flush
* any temp data when retrying.
*/
- while (!spgdoinsert(index, &buildstate->spgstate, tid,
- *values, *isnull))
+ while (!spgdoinsert(index, &buildstate->spgstate, tid, values, isnull))
{
MemoryContextReset(buildstate->tmpCtx);
}
@@ -226,7 +225,7 @@ spginsert(Relation index, Datum *values, bool *isnull,
* to avoid cumulative memory consumption. That means we also have to
* redo initSpGistState(), but it's cheap enough not to matter.
*/
- while (!spgdoinsert(index, &spgstate, ht_ctid, *values, *isnull))
+ while (!spgdoinsert(index, &spgstate, ht_ctid, values, isnull))
{
MemoryContextReset(insertCtx);
initSpGistState(&spgstate, index);
diff --git a/src/backend/access/spgist/spgscan.c b/src/backend/access/spgist/spgscan.c
index 4d506bfb9a..aff130f78a 100644
--- a/src/backend/access/spgist/spgscan.c
+++ b/src/backend/access/spgist/spgscan.c
@@ -28,7 +28,8 @@
typedef void (*storeRes_func) (SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isNull, bool recheck,
- bool recheckDistances, double *distances);
+ bool recheckDistances, double *distances,
+ SpGistLeafTuple leafTuple);
/*
* Pairing heap comparison function for the SpGistSearchItem queue.
@@ -88,6 +89,9 @@ spgFreeSearchItem(SpGistScanOpaque so, SpGistSearchItem *item)
if (item->traversalValue)
pfree(item->traversalValue);
+ if (item->isLeaf && item->leafTuple)
+ pfree(item->leafTuple);
+
pfree(item);
}
@@ -134,6 +138,8 @@ spgAddStartItem(SpGistScanOpaque so, bool isnull)
startEntry->recheck = false;
startEntry->recheckDistances = false;
+ startEntry->leafTuple = NULL;
+
spgAddSearchItemToQueue(so, startEntry);
}
@@ -438,14 +444,30 @@ spgendscan(IndexScanDesc scan)
* Leaf SpGistSearchItem constructor, called in queue context
*/
static SpGistSearchItem *
-spgNewHeapItem(SpGistScanOpaque so, int level, ItemPointer heapPtr,
+spgNewHeapItem(SpGistScanOpaque so, int level, SpGistLeafTuple leafTuple,
Datum leafValue, bool recheck, bool recheckDistances,
bool isnull, double *distances)
{
SpGistSearchItem *item = spgAllocSearchItem(so, isnull, distances);
+ /*
+ * If there are INCLUDE attributes search item in the queue should contain
+ * them.
+ */
+ if (so->state.includeTupdesc)
+ {
+ Assert(so->state.includeTupdesc->natts);
+
+ item->leafTuple = palloc(leafTuple->size);
+ memcpy(item->leafTuple, leafTuple, leafTuple->size);
+ }
+ else
+ {
+ item->leafTuple = NULL;
+ }
+
item->level = level;
- item->heapPtr = *heapPtr;
+ item->heapPtr = leafTuple->heapPtr;
/* copy value to queue cxt out of tmp cxt */
item->value = isnull ? (Datum) 0 :
datumCopy(leafValue, so->state.attLeafType.attbyval,
@@ -503,6 +525,8 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
in.returnData = so->want_itup;
in.leafDatum = SGLTDATUM(leafTuple, &so->state);
+
+
out.leafValue = (Datum) 0;
out.recheck = false;
out.distances = NULL;
@@ -528,7 +552,7 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
/* the scan is ordered -> add the item to the queue */
MemoryContext oldCxt = MemoryContextSwitchTo(so->traversalCxt);
SpGistSearchItem *heapItem = spgNewHeapItem(so, item->level,
- &leafTuple->heapPtr,
+ leafTuple,
leafValue,
recheck,
recheckDistances,
@@ -543,8 +567,10 @@ spgLeafTest(SpGistScanOpaque so, SpGistSearchItem *item,
{
/* non-ordered scan, so report the item right away */
Assert(!recheckDistances);
+
storeRes(so, &leafTuple->heapPtr, leafValue, isnull,
- recheck, false, NULL);
+ recheck, false, NULL, leafTuple);
+
*reportedSome = true;
}
}
@@ -736,7 +762,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
/* dead tuple should be first in chain */
Assert(offset == ItemPointerGetOffsetNumber(&item->heapPtr));
/* No live entries on this page */
- Assert(leafTuple->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(leafTuple) == InvalidOffsetNumber);
return SpGistBreakOffsetNumber;
}
}
@@ -750,7 +776,7 @@ spgTestLeafTuple(SpGistScanOpaque so,
spgLeafTest(so, item, leafTuple, isnull, reportedSome, storeRes);
- return leafTuple->nextOffset;
+ return SGLT_GET_OFFSET(leafTuple);
}
/*
@@ -782,8 +808,8 @@ redirect:
{
/* We store heap items in the queue only in case of ordered search */
Assert(so->numberOfNonNullOrderBys > 0);
- storeRes(so, &item->heapPtr, item->value, item->isNull,
- item->recheck, item->recheckDistances, item->distances);
+ storeRes(so, &item->heapPtr, item->value, item->isNull, item->recheck,
+ item->recheckDistances, item->distances, item->leafTuple);
reportedSome = true;
}
else
@@ -877,7 +903,7 @@ redirect:
static void
storeBitmap(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *distances)
+ double *distances, SpGistLeafTuple leafTuple)
{
Assert(!recheckDistances && !distances);
tbm_add_tuples(so->tbm, heapPtr, 1, recheck);
@@ -904,7 +930,7 @@ spggetbitmap(IndexScanDesc scan, TIDBitmap *tbm)
static void
storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
Datum leafValue, bool isnull, bool recheck, bool recheckDistances,
- double *nonNullDistances)
+ double *nonNullDistances, SpGistLeafTuple leafTuple)
{
Assert(so->nPtrs < MaxIndexTuplesPerPage);
so->heapPtrs[so->nPtrs] = *heapPtr;
@@ -949,9 +975,38 @@ storeGettuple(SpGistScanOpaque so, ItemPointer heapPtr,
* Reconstruct index data. We have to copy the datum out of the temp
* context anyway, so we may as well create the tuple here.
*/
- so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
- &leafValue,
- &isnull);
+ if (so->state.includeTupdesc)
+ {
+ /* Add INCLUDE attributes */
+ Datum *leafDatums;
+ bool *leafIsnulls;
+
+ Assert(so->state.includeTupdesc->natts);
+
+ leafDatums = (Datum *) palloc(sizeof(Datum) * (so->state.includeTupdesc->natts + 1));
+ leafIsnulls = (bool *) palloc(sizeof(bool) * (so->state.includeTupdesc->natts + 1));
+
+ spgDeformLeafTuple(leafTuple, &so->state, leafDatums, leafIsnulls, isnull);
+
+ /*
+ * override key value extracted from LeafTuple in case we've
+ * reconstructed it already
+ */
+ leafDatums[spgKeyColumn] = leafValue;
+ leafIsnulls[spgKeyColumn] = isnull;
+
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ leafDatums,
+ leafIsnulls);
+ pfree(leafDatums);
+ pfree(leafIsnulls);
+ }
+ else
+ {
+ so->reconTups[so->nPtrs] = heap_form_tuple(so->indexTupDesc,
+ &leafValue,
+ &isnull);
+ }
}
so->nPtrs++;
}
@@ -1019,6 +1074,10 @@ spgcanreturn(Relation index, int attno)
{
SpGistCache *cache;
+ /* INCLUDE attributes can always be fetched for index-only scans */
+ if (attno > 1)
+ return true;
+
/* We can do it if the opclass config function says so */
cache = spgGetCache(index);
diff --git a/src/backend/access/spgist/spgutils.c b/src/backend/access/spgist/spgutils.c
index 0efe05e552..cbe4012074 100644
--- a/src/backend/access/spgist/spgutils.c
+++ b/src/backend/access/spgist/spgutils.c
@@ -31,7 +31,18 @@
#include "utils/index_selfuncs.h"
#include "utils/lsyscache.h"
#include "utils/syscache.h"
+#include "access/itup.h"
+#include "access/detoast.h"
+#include "access/toast_internals.h"
+#include "access/heaptoast.h"
+#include "utils/expandeddatum.h"
+/* Does att's datatype allow packing into the 1-byte-header varlena format? */
+#define ATT_IS_PACKABLE(att) \
+ ((att)->attlen == -1 && (att)->attstorage != TYPSTORAGE_PLAIN)
+
+Size spgIncludedDataSize(TupleDesc tupleDesc, Datum *values,
+ bool *isnull, Size start);
/*
* SP-GiST handler function: return IndexAmRoutine with access method parameters
@@ -49,7 +60,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amcanorderbyop = true;
amroutine->amcanbackward = false;
amroutine->amcanunique = false;
- amroutine->amcanmulticol = false;
+ amroutine->amcanmulticol = true;
amroutine->amoptionalkey = true;
amroutine->amsearcharray = false;
amroutine->amsearchnulls = true;
@@ -57,7 +68,7 @@ spghandler(PG_FUNCTION_ARGS)
amroutine->amclusterable = false;
amroutine->ampredlocks = false;
amroutine->amcanparallel = false;
- amroutine->amcaninclude = false;
+ amroutine->amcaninclude = true;
amroutine->amusemaintenanceworkmem = false;
amroutine->amparallelvacuumoptions =
VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
@@ -104,6 +115,7 @@ SpGistCache *
spgGetCache(Relation index)
{
SpGistCache *cache;
+ int i;
if (index->rd_amcache == NULL)
{
@@ -116,14 +128,26 @@ spgGetCache(Relation index)
cache = MemoryContextAllocZero(index->rd_indexcxt,
sizeof(SpGistCache));
- /* SPGiST doesn't support multi-column indexes */
- Assert(index->rd_att->natts == 1);
+ /*
+ * SPGiST should have one key column and can also have INCLUDE
+ * columns
+ */
+ if (IndexRelationGetNumberOfKeyAttributes(index) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("SPGiST index can have only one key column")));
+ if (IndexRelationGetNumberOfAttributes(index) >= INDEX_MAX_KEYS)
+ ereport(ERROR,
+ (errcode(ERRCODE_TOO_MANY_COLUMNS),
+ errmsg("number of index columns (%d) exceeds limit (%d)",
+ IndexRelationGetNumberOfAttributes(index), INDEX_MAX_KEYS)));
/*
- * Get the actual data type of the indexed column from the index
- * tupdesc. We pass this to the opclass config function so that
- * polymorphic opclasses are possible.
+ * Get the actual data type of the key column from the index tupdesc.
+ * We pass this to the opclass config function so that polymorphic
+ * opclasses are possible.
*/
+
atttype = TupleDescAttr(index->rd_att, 0)->atttypid;
/* Call the config function to get config info for the opclass */
@@ -156,6 +180,7 @@ spgGetCache(Relation index)
fillTypeDesc(&cache->attPrefixType, cache->config.prefixType);
fillTypeDesc(&cache->attLabelType, cache->config.labelType);
+
/* Last, get the lastUsedPages data from the metapage */
metabuffer = ReadBuffer(index, SPGIST_METAPAGE_BLKNO);
LockBuffer(metabuffer, BUFFER_LOCK_SHARE);
@@ -178,6 +203,18 @@ spgGetCache(Relation index)
cache = (SpGistCache *) index->rd_amcache;
}
+ /* Form descriptor for INCLUDE columns if any */
+ cache->includeTupdesc = NULL;
+ for (i = 0; i < IndexRelationGetNumberOfAttributes(index) - 1; i++)
+ {
+ if (cache->includeTupdesc == NULL)
+ cache->includeTupdesc = CreateTemplateTupleDesc(
+ IndexRelationGetNumberOfAttributes(index) - 1);
+
+ TupleDescInitEntry(cache->includeTupdesc, i + 1, NULL,
+ TupleDescAttr(index->rd_att, i + 1)->atttypid, -1, 0);
+ }
+
return cache;
}
@@ -190,6 +227,7 @@ initSpGistState(SpGistState *state, Relation index)
/* Get cached static information about index */
cache = spgGetCache(index);
+ state->includeTupdesc = cache->includeTupdesc;
state->config = cache->config;
state->attType = cache->attType;
state->attLeafType = cache->attLeafType;
@@ -603,8 +641,8 @@ spgoptions(Datum reloptions, bool validate)
/*
* Get the space needed to store a non-null datum of the indicated type.
- * Note the result is already rounded up to a MAXALIGN boundary.
- * Also, we follow the SPGiST convention that pass-by-val types are
+ * Note the result is not maxaligned and this should be done by the caller if
+ * needed. Also, we follow the SPGiST convention that pass-by-val types are
* just stored in their Datum representation (compare memcpyDatum).
*/
unsigned int
@@ -619,7 +657,7 @@ SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum)
else
size = VARSIZE_ANY(datum);
- return MAXALIGN(size);
+ return size;
}
/*
@@ -642,36 +680,197 @@ memcpyDatum(void *target, SpGistTypeDesc *att, Datum datum)
}
/*
- * Construct a leaf tuple containing the given heap TID and datum value
+ * Private version of heap_compute_data_size with start address not
+ * at MAXALIGN boundary. The reason is that start address (and alignment)
+ * influence alignment of each of next values and overall size of INCLUDE
+ * data area in SpGiST leaf tuple. MAXALINGing first INCLUDE attribute is
+ * avoided for not to introduce unnecessary gap before it.
+ */
+Size
+spgIncludedDataSize(TupleDesc tupleDesc,
+ Datum *values,
+ bool *isnull, Size start)
+{
+ Size data_length = 0;
+ int i;
+ int numberOfAttributes = tupleDesc->natts;
+
+ data_length = start;
+ for (i = 0; i < numberOfAttributes; i++)
+ {
+ Datum val;
+ Form_pg_attribute atti;
+
+ if (isnull[i])
+ continue;
+
+ val = values[i];
+ atti = TupleDescAttr(tupleDesc, i);
+
+ if (ATT_IS_PACKABLE(atti) &&
+ VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
+ {
+ /*
+ * we're anticipating converting to a short varlena header, so
+ * adjust length and don't count any alignment
+ */
+ data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
+ }
+ else if (atti->attlen == -1 &&
+ VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
+ {
+ /*
+ * we want to flatten the expanded value so that the constructed
+ * tuple doesn't depend on it
+ */
+ data_length = att_align_nominal(data_length, atti->attalign);
+ data_length += EOH_get_flat_size(DatumGetEOHP(val));
+ }
+ else
+ {
+ data_length = att_align_datum(data_length, atti->attalign,
+ atti->attlen, val);
+ data_length = att_addlength_datum(data_length, atti->attlen,
+ val);
+ }
+ }
+ return data_length - start;
+}
+
+/* Calculate overall leaf tuple size. SGLTHDRSZ is MAXALIGNed for backward
+ * compatibility and there might be a gap between header and key data. After
+ * key data there are no such gaps more than is is necessary for each value
+ * alignment. Overall result is MAXALIGNed which is anyway unavoidable
+ * when placing a tuple on a page.
+ */
+unsigned int
+spgLeafTupleSize(SpGistState *state, Datum *datum, bool *isnull)
+{
+ /* compute space needed, nullmask size and offset for INCLUDE attributes */
+ unsigned int size = SGLTHDRSZ;
+ unsigned int i;
+
+ if (!isnull[spgKeyColumn])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[spgKeyColumn]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ Assert(state->includeTupdesc->natts + 1 <= INDEX_MAX_KEYS);
+ /* nullmask size */
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ size += (state->includeTupdesc->natts / 8) + 1;
+ break;
+ }
+ }
+ /* overall INCLUDE attributes size each with added proper alignment. */
+ size += spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ }
+ return MAXALIGN(size);
+}
+
+/*
+ * Construct a leaf tuple containing the given heap TID, key data and INCLUDE
+ * columns data. Key data starts from MAXALIGN boundary for backward compatibility.
+ * Nullmask apply only to INCLUDE attributes and is placed just after key data if
+ * there is at least one NULL among INCLUDE attributes. It doesn't need alignment.
+ * Then all INCLUDE columns data follow aligned by their typealign-s.
*/
SpGistLeafTuple
spgFormLeafTuple(SpGistState *state, ItemPointer heapPtr,
- Datum datum, bool isnull)
+ Datum *datum, bool *isnull)
{
SpGistLeafTuple tup;
- unsigned int size;
+ unsigned int size = SGLTHDRSZ;
+ unsigned int include_offset = 0;
+ unsigned int nullmask_size = 0;
+ unsigned int data_offset = 0;
+ unsigned int data_size = 0;
+ uint16 tupmask = 0;
+ int i;
- /* compute space needed (note result is already maxaligned) */
- size = SGLTHDRSZ;
- if (!isnull)
- size += SpGistGetTypeSize(&state->attLeafType, datum);
+ /*
+ * Calculate space needed. If there are INCLUDE attributes also calculate
+ * sizes and offsets needed for heap_fill_tuple
+ */
+ if (!isnull[spgKeyColumn])
+ /* key attribute size (not maxaligned) */
+ size += SpGistGetTypeSize(&state->attLeafType, datum[spgKeyColumn]);
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ Assert(state->includeTupdesc->natts + 1 <= INDEX_MAX_KEYS);
+
+ include_offset = size;
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ if (isnull[i])
+ {
+ nullmask_size = (state->includeTupdesc->natts / 8) + 1;
+ size += nullmask_size;
+ break;
+ }
+ }
+
+ /*
+ * Alignment of all INCLUDE attributes is counted inside data_size.
+ * data_offset itself is not aligned.
+ */
+ data_size = spgIncludedDataSize(state->includeTupdesc, datum + 1, isnull + 1, size);
+ data_offset = size;
+
+ size += data_size;
+ }
/*
- * Ensure that we can replace the tuple with a dead tuple later. This
- * test is unnecessary when !isnull, but let's be safe.
+ * Ensure that we can replace the tuple with a dead tuple later. This
+ * test is unnecessary when !isnull[spgKeyColumn], but let's be safe.
*/
if (size < SGDTSIZE)
size = SGDTSIZE;
/* OK, form the tuple */
- tup = (SpGistLeafTuple) palloc0(size);
+ tup = (SpGistLeafTuple) palloc0(MAXALIGN(size));
- tup->size = size;
- tup->nextOffset = InvalidOffsetNumber;
+ tup->size = MAXALIGN(size);
+ SGLT_SET_OFFSET(tup, InvalidOffsetNumber);
tup->heapPtr = *heapPtr;
- if (!isnull)
- memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum);
+ if (!isnull[spgKeyColumn])
+ memcpyDatum(SGLTDATAPTR(tup), &state->attLeafType, datum[spgKeyColumn]);
+
+ /* Add INCLUDE columns data to leaf tuple if any. */
+ if (state->includeTupdesc)
+ {
+ /*
+ * The start of INCLUDE attributes tuple (include_offset) is next
+ * byte after end of a key value and is not required to be aligned.
+ * Nullmask is included without alignment and values alignment are
+ * done by heap_fill_tuple() automatically.
+ */
+ heap_fill_tuple(state->includeTupdesc, datum + 1, isnull + 1,
+ (char *) tup + data_offset,
+ data_size, &tupmask,
+ (nullmask_size ? (bits8 *) tup + include_offset : NULL));
+
+ if (nullmask_size)
+ SGLT_SET_CONTAINSNULLMASK(tup, true);
+
+ /*
+ * We do this because heap_fill_tuple wants to initialize a "tupmask"
+ * which is used for HeapTuples, but the only relevant info is the
+ * "has variable attributes" field. We have already set the hasnull
+ * bit above.
+ */
+ if (tupmask & HEAP_HASVARWIDTH)
+ SGLT_SET_CONTAINSVARATT(tup, true);
+ }
return tup;
}
@@ -688,10 +887,10 @@ spgFormNodeTuple(SpGistState *state, Datum label, bool isnull)
unsigned int size;
unsigned short infomask = 0;
- /* compute space needed (note result is already maxaligned) */
+ /* compute space needed */
size = SGNTHDRSZ;
if (!isnull)
- size += SpGistGetTypeSize(&state->attLabelType, label);
+ size += MAXALIGN(SpGistGetTypeSize(&state->attLabelType, label));
/*
* Here we make sure that the size will fit in the field reserved for it
@@ -735,7 +934,7 @@ spgFormInnerTuple(SpGistState *state, bool hasPrefix, Datum prefix,
/* Compute size needed */
if (hasPrefix)
- prefixSize = SpGistGetTypeSize(&state->attPrefixType, prefix);
+ prefixSize = MAXALIGN(SpGistGetTypeSize(&state->attPrefixType, prefix));
else
prefixSize = 0;
@@ -814,7 +1013,7 @@ spgFormDeadTuple(SpGistState *state, int tupstate,
tuple->tupstate = tupstate;
tuple->size = SGDTSIZE;
- tuple->nextOffset = InvalidOffsetNumber;
+ tuple->t_info = InvalidOffsetNumber;
if (tupstate == SPGIST_REDIRECT)
{
@@ -1046,3 +1245,129 @@ spgproperty(Oid index_oid, int attno,
return true;
}
+
+/*
+ * Convert an SpGist tuple into palloc'd Datum/isnull arrays.
+ *
+ */
+void
+spgDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state, Datum *datum, bool *isnull,
+ bool key_isnull)
+{
+ unsigned int include_offset; /* offset of INCLUDE data */
+ int off;
+ bits8 *nullmask_ptr = NULL; /* ptr to null bitmap in tuple */
+ char *tp;
+ bool slow = false; /* can we use/set attcacheoff? */
+ int i;
+
+ if (key_isnull)
+ {
+ datum[spgKeyColumn] = (Datum) 0;
+ isnull[spgKeyColumn] = true;
+ }
+ else
+ {
+ datum[spgKeyColumn] = SGLTDATUM(tup, state);
+ isnull[spgKeyColumn] = false;
+ }
+
+ if (state->includeTupdesc)
+ {
+ Assert(state->includeTupdesc->natts);
+ Assert(state->includeTupdesc->natts + 1 <= INDEX_MAX_KEYS);
+
+ include_offset = key_isnull ? SGLTHDRSZ : SGLTHDRSZ + SpGistGetTypeSize(&state->attLeafType, datum[spgKeyColumn]);
+
+ tp = (char *) tup;
+ off = include_offset;
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup))
+ {
+ nullmask_ptr = (bits8 *) tp + include_offset;
+ off += (state->includeTupdesc->natts) / 8 + 1;
+ }
+
+ if (state->attLeafType.attlen > 0 && !SGLT_GET_CONTAINSVARATT(tup) &&
+ !SGLT_GET_CONTAINSNULLMASK(tup))
+ /* can use attcacheoff for all attributes */
+ {
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ isnull[i] = false;
+ if (thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else
+ {
+ off = att_align_nominal(off, thisatt->attalign);
+ thisatt->attcacheoff = off;
+ }
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+ }
+ }
+ else
+
+ /*
+ * general case: can use cache until first null or varlen
+ * attribute
+ */
+ {
+ if (state->attLeafType.attlen <= 0)
+ slow = true; /* can't use attcacheoff at all */
+
+ for (i = 1; i <= state->includeTupdesc->natts; i++)
+ {
+ Form_pg_attribute thisatt = TupleDescAttr(state->includeTupdesc, i - 1);
+
+ if (SGLT_GET_CONTAINSNULLMASK(tup))
+ {
+ if (att_isnull(i - 1, nullmask_ptr))
+ {
+ datum[i] = (Datum) 0;
+ isnull[i] = true;
+ slow = true; /* can't use attcacheoff anymore */
+ continue;
+ }
+ }
+
+ isnull[i] = false;
+
+ if (!slow && thisatt->attcacheoff >= 0)
+ off = thisatt->attcacheoff;
+ else if (thisatt->attlen == -1)
+ {
+ /*
+ * We can only cache the offset for a varlena attribute if
+ * the offset is already suitably aligned, so that there
+ * would be no pad bytes in any case: then the offset will
+ * be valid for either an aligned or unaligned value.
+ */
+ if (!slow && off == att_align_nominal(off, thisatt->attalign))
+ thisatt->attcacheoff = off;
+ else
+ {
+ off = att_align_pointer(off, thisatt->attalign, -1, tp + off);
+ slow = true;
+ }
+ }
+ else
+ {
+ /* not varlena, so safe to use att_align_nominal */
+ off = att_align_nominal(off, thisatt->attalign);
+
+ if (!slow)
+ thisatt->attcacheoff = off;
+ }
+
+ datum[i] = fetchatt(thisatt, tp + off);
+ off = att_addlength_pointer(off, thisatt->attlen, tp + off);
+
+ if (thisatt->attlen <= 0)
+ slow = true; /* can't use attcacheoff anymore */
+ }
+ }
+ }
+}
diff --git a/src/backend/access/spgist/spgvacuum.c b/src/backend/access/spgist/spgvacuum.c
index bd98707f3c..f23f9d0b1e 100644
--- a/src/backend/access/spgist/spgvacuum.c
+++ b/src/backend/access/spgist/spgvacuum.c
@@ -168,23 +168,28 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
/* Form predecessor map, too */
- if (lt->nextOffset != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt) != InvalidOffsetNumber)
{
/* paranoia about corrupted chain links */
- if (lt->nextOffset < FirstOffsetNumber ||
- lt->nextOffset > max ||
- predecessor[lt->nextOffset] != InvalidOffsetNumber)
+ if (SGLT_GET_OFFSET(lt) < FirstOffsetNumber ||
+ SGLT_GET_OFFSET(lt) > max ||
+ predecessor[SGLT_GET_OFFSET(lt)] != InvalidOffsetNumber)
elog(ERROR, "inconsistent tuple chain links in page %u of index \"%s\"",
BufferGetBlockNumber(buffer),
RelationGetRelationName(index));
- predecessor[lt->nextOffset] = i;
+ predecessor[SGLT_GET_OFFSET(lt)] = i;
}
}
else if (lt->tupstate == SPGIST_REDIRECT)
{
SpGistDeadTuple dt = (SpGistDeadTuple) lt;
- Assert(dt->nextOffset == InvalidOffsetNumber);
+ /*
+ * Dead tuple nextOffset is allowed to have any values of two
+ * highest bits in case it is inherited from SpGistLeafTuple where
+ * these bits have their own meaning.
+ */
+ Assert(SGLT_GET_OFFSET(dt) == InvalidOffsetNumber);
Assert(ItemPointerIsValid(&dt->pointer));
/*
@@ -201,7 +206,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
}
else
{
- Assert(lt->nextOffset == InvalidOffsetNumber);
+ Assert(SGLT_GET_OFFSET(lt) == InvalidOffsetNumber);
}
}
@@ -250,7 +255,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
prevLive = deletable[i] ? InvalidOffsetNumber : i;
/* scan down the chain ... */
- j = head->nextOffset;
+ j = SGLT_GET_OFFSET(head);
while (j != InvalidOffsetNumber)
{
SpGistLeafTuple lt;
@@ -301,7 +306,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
interveningDeletable = false;
}
- j = lt->nextOffset;
+ j = SGLT_GET_OFFSET(lt);
}
if (prevLive == InvalidOffsetNumber)
@@ -366,7 +371,7 @@ vacuumLeafPage(spgBulkDeleteState *bds, Relation index, Buffer buffer,
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt, chainDest[i]);
}
MarkBufferDirty(buffer);
diff --git a/src/backend/access/spgist/spgxlog.c b/src/backend/access/spgist/spgxlog.c
index 7be2291d07..bbc2b91abc 100644
--- a/src/backend/access/spgist/spgxlog.c
+++ b/src/backend/access/spgist/spgxlog.c
@@ -122,8 +122,8 @@ spgRedoAddLeaf(XLogReaderState *record)
head = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, xldata->offnumHeadLeaf));
- Assert(head->nextOffset == leafTupleHdr.nextOffset);
- head->nextOffset = xldata->offnumLeaf;
+ Assert(SGLT_GET_OFFSET(head) == SGLT_GET_OFFSET(&leafTupleHdr));
+ SGLT_SET_OFFSET(head, xldata->offnumLeaf);
}
}
else
@@ -822,7 +822,7 @@ spgRedoVacuumLeaf(XLogReaderState *record)
lt = (SpGistLeafTuple) PageGetItem(page,
PageGetItemId(page, chainSrc[i]));
Assert(lt->tupstate == SPGIST_LIVE);
- lt->nextOffset = chainDest[i];
+ SGLT_SET_OFFSET(lt, chainDest[i]);
}
PageSetLSN(page, lsn);
diff --git a/src/include/access/spgist_private.h b/src/include/access/spgist_private.h
index 00b98ec6a0..03cbf826a7 100644
--- a/src/include/access/spgist_private.h
+++ b/src/include/access/spgist_private.h
@@ -22,13 +22,14 @@
#include "utils/geo_decls.h"
#include "utils/relcache.h"
-
typedef struct SpGistOptions
{
int32 varlena_header_; /* varlena header (do not touch directly!) */
int fillfactor; /* page fill factor in percent (0..100) */
} SpGistOptions;
+#define spgKeyColumn 0
+
#define SpGistGetFillFactor(relation) \
(AssertMacro(relation->rd_rel->relkind == RELKIND_INDEX && \
relation->rd_rel->relam == SPGIST_AM_OID), \
@@ -141,6 +142,7 @@ typedef struct SpGistState
SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc; /* tuple descriptor of INCLUDE columns */
char *deadTupleStorage; /* workspace for spgFormDeadTuple */
@@ -148,104 +150,6 @@ typedef struct SpGistState
bool isBuild; /* true if doing index build */
} SpGistState;
-typedef struct SpGistSearchItem
-{
- pairingheap_node phNode; /* pairing heap node */
- Datum value; /* value reconstructed from parent or
- * leafValue if heaptuple */
- void *traversalValue; /* opclass-specific traverse value */
- int level; /* level of items on this page */
- ItemPointerData heapPtr; /* heap info, if heap tuple */
- bool isNull; /* SearchItem is NULL item */
- bool isLeaf; /* SearchItem is heap item */
- bool recheck; /* qual recheck is needed */
- bool recheckDistances; /* distance recheck is needed */
-
- /* array with numberOfOrderBys entries */
- double distances[FLEXIBLE_ARRAY_MEMBER];
-} SpGistSearchItem;
-
-#define SizeOfSpGistSearchItem(n_distances) \
- (offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
-
-/*
- * Private state of an index scan
- */
-typedef struct SpGistScanOpaqueData
-{
- SpGistState state; /* see above */
- pairingheap *scanQueue; /* queue of to be visited items */
- MemoryContext tempCxt; /* short-lived memory context */
- MemoryContext traversalCxt; /* single scan lifetime memory context */
-
- /* Control flags showing whether to search nulls and/or non-nulls */
- bool searchNulls; /* scan matches (all) null entries */
- bool searchNonNulls; /* scan matches (some) non-null entries */
-
- /* Index quals to be passed to opclass (null-related quals removed) */
- int numberOfKeys; /* number of index qualifier conditions */
- ScanKey keyData; /* array of index qualifier descriptors */
- int numberOfOrderBys; /* number of ordering operators */
- int numberOfNonNullOrderBys; /* number of ordering operators
- * with non-NULL arguments */
- ScanKey orderByData; /* array of ordering op descriptors */
- Oid *orderByTypes; /* array of ordering op return types */
- int *nonNullOrderByOffsets; /* array of offset of non-NULL
- * ordering keys in the original array */
- Oid indexCollation; /* collation of index column */
-
- /* Opclass defined functions: */
- FmgrInfo innerConsistentFn;
- FmgrInfo leafConsistentFn;
-
- /* Pre-allocated workspace arrays: */
- double *zeroDistances;
- double *infDistances;
-
- /* These fields are only used in amgetbitmap scans: */
- TIDBitmap *tbm; /* bitmap being filled */
- int64 ntids; /* number of TIDs passed to bitmap */
-
- /* These fields are only used in amgettuple scans: */
- bool want_itup; /* are we reconstructing tuples? */
- TupleDesc indexTupDesc; /* if so, tuple descriptor for them */
- int nPtrs; /* number of TIDs found on current page */
- int iPtr; /* index for scanning through same */
- ItemPointerData heapPtrs[MaxIndexTuplesPerPage]; /* TIDs from cur page */
- bool recheck[MaxIndexTuplesPerPage]; /* their recheck flags */
- bool recheckDistances[MaxIndexTuplesPerPage]; /* distance recheck
- * flags */
- HeapTuple reconTups[MaxIndexTuplesPerPage]; /* reconstructed tuples */
-
- /* distances (for recheck) */
- IndexOrderByDistance *distances[MaxIndexTuplesPerPage];
-
- /*
- * Note: using MaxIndexTuplesPerPage above is a bit hokey since
- * SpGistLeafTuples aren't exactly IndexTuples; however, they are larger,
- * so this is safe.
- */
-} SpGistScanOpaqueData;
-
-typedef SpGistScanOpaqueData *SpGistScanOpaque;
-
-/*
- * This struct is what we actually keep in index->rd_amcache. It includes
- * static configuration information as well as the lastUsedPages cache.
- */
-typedef struct SpGistCache
-{
- spgConfigOut config; /* filled in by opclass config method */
-
- SpGistTypeDesc attType; /* type of values to be indexed/restored */
- SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
- SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
- SpGistTypeDesc attLabelType; /* type of node label values */
-
- SpGistLUPCache lastUsedPages; /* local storage of last-used info */
-} SpGistCache;
-
-
/*
* SPGiST tuple types. Note: inner, leaf, and dead tuple structs
* must have the same tupstate field in the same position! Real inner and
@@ -305,8 +209,8 @@ typedef SpGistInnerTupleData *SpGistInnerTuple;
* SPGiST node tuple: one node within an inner tuple
*
* Node tuples use the same header as ordinary Postgres IndexTuples, but
- * we do not use a null bitmap, because we know there is only one column
- * so the INDEX_NULL_MASK bit suffices. Also, pass-by-value datums are
+ * we do not use a null bitmap, because we know there is only one key column
+ * so the INDEX_NULL_MASK bit suffices. Also, pass-by-value datums are
* stored as a full Datum, the same convention as for inner tuple prefixes
* and leaf tuple datums.
*/
@@ -322,21 +226,19 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
PointerGetDatum(SGNTDATAPTR(x)))
/*
- * SPGiST leaf tuple: carries a datum and a heap tuple TID
+ * SPGiST leaf tuple: carries a key datum, a heap tuple TID and optional
+ * datums and nullmask of INCLUDE columns.
*
- * In the simplest case, the datum is the same as the indexed value; but
+ * In the simplest case, the key datum is the same as the indexed value; but
* it could also be a suffix or some other sort of delta that permits
* reconstruction given knowledge of the prefix path traversed to get here.
+ * Datums of INCLUDE columns are stored without modification.
*
* The size field is wider than could possibly be needed for an on-disk leaf
* tuple, but this allows us to form leaf tuples even when the datum is too
* wide to be stored immediately, and it costs nothing because of alignment
* considerations.
*
- * Normally, nextOffset links to the next tuple belonging to the same parent
- * node (which must be on the same page). But when the root page is a leaf
- * page, we don't chain its tuples, so nextOffset is always 0 on the root.
- *
* size must be a multiple of MAXALIGN; also, it must be at least SGDTSIZE
* so that the tuple can be converted to REDIRECT status later. (This
* restriction only adds bytes for the null-datum case, otherwise alignment
@@ -346,14 +248,48 @@ typedef SpGistNodeTupleData *SpGistNodeTuple;
* however, the SGDTSIZE limit ensures that's there's a Datum word there
* anyway, so SGLTDATUM can be applied safely as long as you don't do
* anything with the result.
+ *
+ * Normally, nextOffset inside t_info links to the next tuple belonging to
+ * the same parent node (which must be on the same page). But when the root
+ * page is a leaf page, we don't chain its tuples, so nextOffset is always 0
+ * on the root. Minimum space to store SpGistLeafTuple plus ItemIdData on a
+ * page is 16 bytes, so 14 lower bits for nextOffset is enough to store tuple
+ * number in a chain on a page even if a page size is 64Kb.
+ *
+ * Two higher bits in t_info are to store per-tuple information for INCLUDE
+ * attributes: is there nulls mask exist, and are there any INCLUDE attributes
+ * of variable length type. If there are no INCLUDE columns these higher bits
+ * are not used and can have any values.
+ *
+ * If there are INCLUDE columns, they are stored after a key value, each
+ * starting from its own typalign boundary. Unlike IndexTuple, first INCLUDE
+ * value does not need to start from MAXALIGN boundary, so SPGiST uses private
+ * routines to access them. Nullmask with size (number of INCLUDE columns)/8
+ * bytes is put without alignment between the key and the first INCLUDE column.
+ * If there is an alignment gap between them, nullmask has a good chance to fit
+ * into the gap, thus making its storage free of charge.
*/
+
typedef struct SpGistLeafTupleData
{
unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
size:30; /* large enough for any palloc'able value */
- OffsetNumber nextOffset; /* next tuple in chain, or InvalidOffsetNumber */
+
+ /* ---------------
+ * t_info is laid out in the following fashion:
+ *
+ * 15th (high) bit: INCLUDE values has nulls
+ * 14th bit: INCLUDE values has var-length attributes
+ * 13-0 bit: nextOffset i.e. number of next tuple in chain on a page,
+ * or InvalidOffsetNumber
+ * ---------------
+ */
+ unsigned short t_info; /* nextOffset for linking tuples in a chain on a leaf
+ page, and additional info for INCLUDE attributes */
ItemPointerData heapPtr; /* TID of represented heap tuple */
- /* leaf datum follows */
+ /* key column data follows */
+ /* nullmask of INCLUDE values follows if there are nulls in INCLUDE attributes*/
+ /* INCLUDE columns data follow if any */
} SpGistLeafTupleData;
typedef SpGistLeafTupleData *SpGistLeafTuple;
@@ -361,8 +297,25 @@ typedef SpGistLeafTupleData *SpGistLeafTuple;
#define SGLTHDRSZ MAXALIGN(sizeof(SpGistLeafTupleData))
#define SGLTDATAPTR(x) (((char *) (x)) + SGLTHDRSZ)
#define SGLTDATUM(x, s) ((s)->attLeafType.attbyval ? \
- *(Datum *) SGLTDATAPTR(x) : \
- PointerGetDatum(SGLTDATAPTR(x)))
+ *(Datum *) SGLTDATAPTR(x) : \
+ PointerGetDatum(SGLTDATAPTR(x)))
+/*
+ * Macros to access nextOffset and bit fields inside t_info independently.
+ */
+#define SGLT_GET_OFFSET(spgLeafTuple) ( (spgLeafTuple)->t_info & 0x3FFF )
+#define SGLT_GET_CONTAINSNULLMASK(spgLeafTuple) \
+ ( (bool)((spgLeafTuple)->t_info >> 15) )
+#define SGLT_GET_CONTAINSVARATT(spgLeafTuple) \
+ ( (bool)(((spgLeafTuple)->t_info & 0x4000) >> 14) )
+#define SGLT_SET_OFFSET(spgLeafTuple, offsetNumber) \
+ ( (spgLeafTuple)->t_info = \
+ ((spgLeafTuple)->t_info & 0xC000) | ((offsetNumber) & 0x3FFF) )
+#define SGLT_SET_CONTAINSNULLMASK(spgLeafTuple, is_null) \
+ ( (spgLeafTuple)->t_info = \
+ ((uint16)(bool)(is_null) << 15) | ((spgLeafTuple)->t_info & 0x3FFF) )
+#define SGLT_SET_CONTAINSVARATT(spgLeafTuple, is_varatt) \
+ ( (spgLeafTuple)->t_info = \
+ ((uint16)(bool)(is_varatt) << 14) | ((spgLeafTuple)->t_info & 0xBFFF) )
/*
* SPGiST dead tuple: declaration for examining non-live tuples
@@ -372,14 +325,14 @@ typedef SpGistLeafTupleData *SpGistLeafTuple;
* Also, the pointer field must be in the same place as a leaf tuple's heapPtr
* field, to satisfy some Asserts that we make when replacing a leaf tuple
* with a dead tuple.
- * We don't use nextOffset, but it's needed to align the pointer field.
+ * We don't use t_info, but it's needed to align the pointer field.
* pointer and xid are only valid when tupstate = REDIRECT.
*/
typedef struct SpGistDeadTupleData
{
unsigned int tupstate:2, /* LIVE/REDIRECT/DEAD/PLACEHOLDER */
size:30;
- OffsetNumber nextOffset; /* not used in dead tuples */
+ unsigned short t_info; /* not used in dead tuples */
ItemPointerData pointer; /* redirection inside index */
TransactionId xid; /* ID of xact that inserted this tuple */
} SpGistDeadTupleData;
@@ -394,7 +347,6 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
* size plus sizeof(ItemIdData) (for the line pointer). This works correctly
* so long as tuple sizes are always maxaligned.
*/
-
/* Page capacity after allowing for fixed header and special space */
#define SPGIST_PAGE_CAPACITY \
MAXALIGN_DOWN(BLCKSZ - \
@@ -410,6 +362,105 @@ typedef SpGistDeadTupleData *SpGistDeadTuple;
Min(SpGistPageGetOpaque(p)->nPlaceholder, n) * \
(SGDTSIZE + sizeof(ItemIdData)))
+
+typedef struct SpGistSearchItem
+{
+ pairingheap_node phNode; /* pairing heap node */
+ Datum value; /* value reconstructed from parent or
+ * leafValue if heaptuple */
+ void *traversalValue; /* opclass-specific traverse value */
+ int level; /* level of items on this page */
+ ItemPointerData heapPtr; /* heap info, if heap tuple */
+ bool isNull; /* SearchItem is NULL item */
+ bool isLeaf; /* SearchItem is heap item */
+ bool recheck; /* qual recheck is needed */
+ bool recheckDistances; /* distance recheck is needed */
+ SpGistLeafTuple leafTuple;
+ /* array with numberOfOrderBys entries */
+ double distances[FLEXIBLE_ARRAY_MEMBER];
+} SpGistSearchItem;
+
+#define SizeOfSpGistSearchItem(n_distances) \
+ (offsetof(SpGistSearchItem, distances) + sizeof(double) * (n_distances))
+
+/*
+ * Private state of an index scan
+ */
+typedef struct SpGistScanOpaqueData
+{
+ SpGistState state; /* see above */
+ pairingheap *scanQueue; /* queue of to be visited items */
+ MemoryContext tempCxt; /* short-lived memory context */
+ MemoryContext traversalCxt; /* single scan lifetime memory context */
+
+ /* Control flags showing whether to search nulls and/or non-nulls */
+ bool searchNulls; /* scan matches (all) null entries */
+ bool searchNonNulls; /* scan matches (some) non-null entries */
+
+ /* Index quals to be passed to opclass (null-related quals removed) */
+ int numberOfKeys; /* number of index qualifier conditions */
+ ScanKey keyData; /* array of index qualifier descriptors */
+ int numberOfOrderBys; /* number of ordering operators */
+ int numberOfNonNullOrderBys; /* number of ordering operators
+ * with non-NULL arguments */
+ ScanKey orderByData; /* array of ordering op descriptors */
+ Oid *orderByTypes; /* array of ordering op return types */
+ int *nonNullOrderByOffsets; /* array of offset of non-NULL
+ * ordering keys in the original array */
+ Oid indexCollation; /* collation of index column */
+
+ /* Opclass defined functions: */
+ FmgrInfo innerConsistentFn;
+ FmgrInfo leafConsistentFn;
+
+ /* Pre-allocated workspace arrays: */
+ double *zeroDistances;
+ double *infDistances;
+
+ /* These fields are only used in amgetbitmap scans: */
+ TIDBitmap *tbm; /* bitmap being filled */
+ int64 ntids; /* number of TIDs passed to bitmap */
+
+ /* These fields are only used in amgettuple scans: */
+ bool want_itup; /* are we reconstructing tuples? */
+ TupleDesc indexTupDesc; /* if so, tuple descriptor for them */
+ int nPtrs; /* number of TIDs found on current page */
+ int iPtr; /* index for scanning through same */
+ ItemPointerData heapPtrs[MaxIndexTuplesPerPage]; /* TIDs from cur page */
+ bool recheck[MaxIndexTuplesPerPage]; /* their recheck flags */
+ bool recheckDistances[MaxIndexTuplesPerPage]; /* distance recheck
+ * flags */
+ HeapTuple reconTups[MaxIndexTuplesPerPage]; /* reconstructed tuples */
+
+ /* distances (for recheck) */
+ IndexOrderByDistance *distances[MaxIndexTuplesPerPage];
+
+ /*
+ * Note: using MaxIndexTuplesPerPage above is a bit hokey since
+ * SpGistLeafTuples aren't exactly IndexTuples; however, they are larger,
+ * so this is safe.
+ */
+} SpGistScanOpaqueData;
+
+typedef SpGistScanOpaqueData *SpGistScanOpaque;
+
+/*
+ * This struct is what we actually keep in index->rd_amcache. It includes
+ * static configuration information as well as the lastUsedPages cache.
+ */
+typedef struct SpGistCache
+{
+ spgConfigOut config; /* filled in by opclass config method */
+
+ SpGistTypeDesc attType; /* type of values to be indexed/restored */
+ SpGistTypeDesc attLeafType; /* type of leaf-tuple values */
+ SpGistTypeDesc attPrefixType; /* type of inner-tuple prefix values */
+ SpGistTypeDesc attLabelType; /* type of node label values */
+ TupleDesc includeTupdesc;
+
+ SpGistLUPCache lastUsedPages; /* local storage of last-used info */
+} SpGistCache;
+
/*
* XLOG stuff
*/
@@ -456,9 +507,10 @@ extern void SpGistInitPage(Page page, uint16 f);
extern void SpGistInitBuffer(Buffer b, uint16 f);
extern void SpGistInitMetapage(Page page);
extern unsigned int SpGistGetTypeSize(SpGistTypeDesc *att, Datum datum);
+extern unsigned int spgLeafTupleSize(SpGistState *state, Datum *datum, bool *isnull);
extern SpGistLeafTuple spgFormLeafTuple(SpGistState *state,
ItemPointer heapPtr,
- Datum datum, bool isnull);
+ Datum *datum, bool *isnull);
extern SpGistNodeTuple spgFormNodeTuple(SpGistState *state,
Datum label, bool isnull);
extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
@@ -466,6 +518,8 @@ extern SpGistInnerTuple spgFormInnerTuple(SpGistState *state,
int nNodes, SpGistNodeTuple *nodes);
extern SpGistDeadTuple spgFormDeadTuple(SpGistState *state, int tupstate,
BlockNumber blkno, OffsetNumber offnum);
+extern void spgDeformLeafTuple(SpGistLeafTuple tup, SpGistState *state,
+ Datum *datum, bool *isnull, bool key_value_isnull);
extern Datum *spgExtractNodeLabels(SpGistState *state,
SpGistInnerTuple innerTuple);
extern OffsetNumber SpGistPageAddNewItem(SpGistState *state, Page page,
@@ -484,7 +538,7 @@ extern void spgPageIndexMultiDelete(SpGistState *state, Page page,
int firststate, int reststate,
BlockNumber blkno, OffsetNumber offnum);
extern bool spgdoinsert(Relation index, SpGistState *state,
- ItemPointer heapPtr, Datum datum, bool isnull);
+ ItemPointer heapPtr, Datum *datum, bool *isnull);
/* spgproc.c */
extern double *spg_key_orderbys_distances(Datum key, bool isLeaf,
diff --git a/src/test/regress/expected/amutils.out b/src/test/regress/expected/amutils.out
index d92a6d12c6..93e6a43b6d 100644
--- a/src/test/regress/expected/amutils.out
+++ b/src/test/regress/expected/amutils.out
@@ -169,9 +169,9 @@ select amname, prop, pg_indexam_has_property(a.oid, prop) as p
hash | bogus |
spgist | can_order | f
spgist | can_unique | f
- spgist | can_multi_col | f
+ spgist | can_multi_col | t
spgist | can_exclude | t
- spgist | can_include | f
+ spgist | can_include | t
spgist | bogus |
(36 rows)
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..86510687c7 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -349,14 +349,13 @@ SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl' ORDER BY indexname;
DROP TABLE tbl;
/*
- * 7. Check various AMs. All but btree and gist must fail.
+ * 7. Check various AMs. All but btree, gist and spgist must fail.
*/
CREATE TABLE tbl (c1 int,c2 int, c3 box, c4 box);
CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "brin" does not support included columns
CREATE INDEX on tbl USING gist(c3) INCLUDE (c1, c4);
CREATE INDEX on tbl USING spgist(c3) INCLUDE (c4);
-ERROR: access method "spgist" does not support included columns
CREATE INDEX on tbl USING gin(c1, c2) INCLUDE (c3, c4);
ERROR: access method "gin" does not support included columns
CREATE INDEX on tbl USING hash(c1, c2) INCLUDE (c3, c4);
diff --git a/src/test/regress/expected/index_including_spgist.out b/src/test/regress/expected/index_including_spgist.out
new file mode 100644
index 0000000000..213cce5c7c
--- /dev/null
+++ b/src/test/regress/expected/index_including_spgist.out
@@ -0,0 +1,143 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+SET enable_seqscan TO off;
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+VACUUM ANALYZE tbl_spgist;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+ pg_get_indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ c1 | c2 | c3 | c4
+----+----+----+-------------
+ 1 | 2 | 3 | (2,3),(1,2)
+ 2 | 4 | 6 | (4,5),(2,3)
+ 3 | 6 | 9 | (6,7),(3,4)
+ 4 | 8 | 12 | (8,9),(4,5)
+(4 rows)
+
+SET enable_bitmapscan TO off;
+VACUUM ANALYZE tbl_spgist;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+ QUERY PLAN
+----------------------------------------------------
+ Index Only Scan using tbl_spgist_idx on tbl_spgist
+ Index Cond: (c4 <@ '(10,10),(1,1)'::box)
+(2 rows)
+
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-----------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c2, c3)
+(1 row)
+
+DROP TABLE tbl_spgist;
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+-------------------------------------------------------------------------------------
+ CREATE INDEX tbl_spgist_idx ON public.tbl_spgist USING spgist (c4) INCLUDE (c1, c3)
+(1 row)
+
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ indexdef
+----------
+(0 rows)
+
+DROP TABLE tbl_spgist;
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+ Table "public.tbl_spgist"
+ Column | Type | Collation | Nullable | Default
+--------+---------+-----------+----------+---------
+ c1 | bigint | | |
+ c2 | integer | | |
+ c3 | bigint | | |
+ c4 | box | | |
+Indexes:
+ "tbl_spgist_idx" spgist (c4) INCLUDE (c1, c3)
+
+RESET enable_seqscan;
+DROP TABLE tbl_spgist;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 026ea880cd..985458a1a8 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -50,7 +50,7 @@ test: copy copyselect copydml insert insert_conflict
# ----------
test: create_misc create_operator create_procedure
# These depend on create_misc and create_operator
-test: create_index create_index_spgist create_view index_including index_including_gist
+test: create_index create_index_spgist create_view index_including index_including_gist index_including_spgist
# ----------
# Another group of parallel tests
diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index 979d926119..f3df961535 100644
--- a/src/test/regress/serial_schedule
+++ b/src/test/regress/serial_schedule
@@ -68,6 +68,7 @@ test: create_index_spgist
test: create_view
test: index_including
test: index_including_gist
+test: index_including_spgist
test: create_aggregate
test: create_function_3
test: create_cast
diff --git a/src/test/regress/sql/index_including.sql b/src/test/regress/sql/index_including.sql
index 7e517483ad..44b340053b 100644
--- a/src/test/regress/sql/index_including.sql
+++ b/src/test/regress/sql/index_including.sql
@@ -182,7 +182,7 @@ SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl' ORDER BY indexname;
DROP TABLE tbl;
/*
- * 7. Check various AMs. All but btree and gist must fail.
+ * 7. Check various AMs. All but btree, gist and spgist must fail.
*/
CREATE TABLE tbl (c1 int,c2 int, c3 box, c4 box);
CREATE INDEX on tbl USING brin(c1, c2) INCLUDE (c3, c4);
diff --git a/src/test/regress/sql/index_including_spgist.sql b/src/test/regress/sql/index_including_spgist.sql
new file mode 100644
index 0000000000..38ace74d4e
--- /dev/null
+++ b/src/test/regress/sql/index_including_spgist.sql
@@ -0,0 +1,84 @@
+/*
+ * 1.1. test CREATE INDEX with buffered build
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+SET enable_seqscan TO off;
+-- size is chosen to exceed page size and trigger actual truncation
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+VACUUM ANALYZE tbl_spgist;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 1.2. test CREATE INDEX with inserts
+ */
+
+-- Regular index with included columns
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+-- size is chosen to exceed page size and trigger actual truncation
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,8000) AS x;
+SELECT pg_get_indexdef(i.indexrelid)
+FROM pg_index i JOIN pg_class c ON i.indexrelid = c.oid
+WHERE i.indrelid = 'tbl_spgist'::regclass ORDER BY c.relname;
+SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO off;
+VACUUM ANALYZE tbl_spgist;
+EXPLAIN (costs off) SELECT * FROM tbl_spgist where c4 <@ box(point(1,1),point(10,10));
+SET enable_bitmapscan TO default;
+DROP TABLE tbl_spgist;
+
+/*
+ * 2. CREATE INDEX CONCURRENTLY
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX CONCURRENTLY tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c2,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+
+/*
+ * 3. REINDEX
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+REINDEX INDEX tbl_spgist_idx;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+ALTER TABLE tbl_spgist DROP COLUMN c1;
+SELECT indexdef FROM pg_indexes WHERE tablename = 'tbl_spgist' ORDER BY indexname;
+DROP TABLE tbl_spgist;
+
+/*
+ * 4. Update, delete values in indexed table.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+UPDATE tbl_spgist SET c1 = 100 WHERE c1 = 2;
+UPDATE tbl_spgist SET c1 = 1 WHERE c1 = 3;
+DELETE FROM tbl_spgist WHERE c1 = 5 OR c3 = 12;
+DROP TABLE tbl_spgist;
+
+/*
+ * 5. Alter column type.
+ */
+CREATE TABLE tbl_spgist (c1 int, c2 int, c3 int, c4 box);
+INSERT INTO tbl_spgist SELECT x, 2*x, 3*x, box(point(x,x+1),point(2*x,2*x+1)) FROM generate_series(1,10) AS x;
+CREATE INDEX tbl_spgist_idx ON tbl_spgist using spgist (c4) INCLUDE (c1,c3);
+ALTER TABLE tbl_spgist ALTER c1 TYPE bigint;
+ALTER TABLE tbl_spgist ALTER c3 TYPE bigint;
+\d tbl_spgist
+RESET enable_seqscan;
+DROP TABLE tbl_spgist;
--
2.28.0
v1-0002-Add-VACUUM-ANALYZE-to-index-including-test.patchapplication/octet-stream; name=v1-0002-Add-VACUUM-ANALYZE-to-index-including-test.patchDownload
From eb0ed1054b766bd110b0d1675a93065c0185a60a Mon Sep 17 00:00:00 2001
From: Pavel Borisov <pashkin.elfe@gmail.com>
Date: Thu, 27 Aug 2020 19:55:37 +0400
Subject: [PATCH v1] Add VACUUM ANALYZE to index including test
---
src/test/regress/expected/index_including.out | 1 +
src/test/regress/sql/index_including.sql | 1 +
2 files changed, 2 insertions(+)
diff --git a/src/test/regress/expected/index_including.out b/src/test/regress/expected/index_including.out
index 8e5d53e712..6a2a13ffa2 100644
--- a/src/test/regress/expected/index_including.out
+++ b/src/test/regress/expected/index_including.out
@@ -146,6 +146,7 @@ select * from tbl where (c1,c2,c3) < (2,5,1);
-- row comparison that compares high key at page boundary
SET enable_seqscan = off;
+VACUUM ANALYZE tbl;
explain (costs off)
select * from tbl where (c1,c2,c3) < (262,1,1) limit 1;
QUERY PLAN
diff --git a/src/test/regress/sql/index_including.sql b/src/test/regress/sql/index_including.sql
index 7e517483ad..1f300fe3b6 100644
--- a/src/test/regress/sql/index_including.sql
+++ b/src/test/regress/sql/index_including.sql
@@ -78,6 +78,7 @@ select * from tbl where (c1,c2,c3) < (2,5,1);
select * from tbl where (c1,c2,c3) < (2,5,1);
-- row comparison that compares high key at page boundary
SET enable_seqscan = off;
+VACUUM ANALYZE tbl;
explain (costs off)
select * from tbl where (c1,c2,c3) < (262,1,1) limit 1;
select * from tbl where (c1,c2,c3) < (262,1,1) limit 1;
--
2.28.0
Pavel Borisov <pashkin.elfe@gmail.com> writes:
[ v10-0001-Covering-SP-GiST-index-support-for-INCLUDE-colum.patch ]
I've started to review this, and I've got to say that I really disagree
with this choice:
+ * If there are INCLUDE columns, they are stored after a key value, each
+ * starting from its own typalign boundary. Unlike IndexTuple, first INCLUDE
+ * value does not need to start from MAXALIGN boundary, so SPGiST uses private
+ * routines to access them.
This seems to require far more new code than it could possibly be worth,
because most of the time anything you could save here is just going
to disappear into end-of-tuple alignment space anyway -- recall that
the overall index tuple length is going to be MAXALIGN'd no matter what.
I think you should yank this out and try to rely on standard tuple
construction/deconstruction code instead.
I also find it unacceptable that you stuck a tuple descriptor pointer into
the rd_amcache structure. The relcache only supports that being a flat
blob of memory. I see that you tried to hack around that by having
spgGetCache reconstruct a new tupdesc every time through, but (a) that's
actually worse than having no cache at all, and (b) spgGetCache doesn't
really know much about the longevity of the memory context it's called in.
This could easily lead to dangling tuple pointers, serious memory bloat
from repeated tupdesc construction, or quite possibly both. Safer would
be to build the tupdesc during initSpGistState(), or maybe just make it
on-demand. In view of the previous point, I'm also wondering if there's
any way to get the relcache's regular rd_att tupdesc to be useful here,
so we don't have to build one during scans at all.
(I wondered for a bit about whether you could keep a long-lived private
tupdesc in the relcache's rd_indexcxt context. But it looks like
relcache.c sometimes resets rd_amcache without also clearing the
rd_indexcxt, so that would probably lead to leakage.)
regards, tom lane
I've started to review this, and I've got to say that I really disagree
with this choice:+ * If there are INCLUDE columns, they are stored after a key value, each + * starting from its own typalign boundary. Unlike IndexTuple, first INCLUDE + * value does not need to start from MAXALIGN boundary, so SPGiST uses private + * routines to access them.This seems to require far more new code than it could possibly be worth,
because most of the time anything you could save here is just going
to disappear into end-of-tuple alignment space anyway -- recall that
the overall index tuple length is going to be MAXALIGN'd no matter what.
I think you should yank this out and try to rely on standard tuple
construction/deconstruction code instead.
I'd say that much of the SELECT performance gain of SP-GiST over GiST is
due to its lightweight pages, each containing more tuples so we can have
less page fetches. And this is the main goal of having lightweight tuples.
PFA my performance measurements for box+cidr selects, with gist and spgist
indexes built on box key-column and cidr (optionally) include column.
The way that seems acceptable to me is to add (optional) nulls mask into
the end of existing style SpGistLeafTuple header and use indextuple
routines to attach attributes after it. In this case, we can reduce the
amount of code at the cost of adding one extra MAXALIGN size to the overall
tuple size on 32-bit arch as now tuple header size of 12 bit already fits 3
MAXALIGNS (on 64 bit the header now is shorter than 2 maxaligns (12 bytes
of 16) and nulls mask will be free of cost). If you mean this I try to make
changes soon. What do you think of it?
I also find it unacceptable that you stuck a tuple descriptor pointer into
the rd_amcache structure. The relcache only supports that being a flat
blob of memory. I see that you tried to hack around that by having
spgGetCache reconstruct a new tupdesc every time through, but (a) that's
actually worse than having no cache at all, and (b) spgGetCache doesn't
really know much about the longevity of the memory context it's called in.
This could easily lead to dangling tuple pointers, serious memory bloat
from repeated tupdesc construction, or quite possibly both. Safer would
be to build the tupdesc during initSpGistState(), or maybe just make it
on-demand. In view of the previous point, I'm also wondering if there's
any way to get the relcache's regular rd_att tupdesc to be useful here,
so we don't have to build one during scans at all.(I wondered for a bit about whether you could keep a long-lived private
tupdesc in the relcache's rd_indexcxt context. But it looks like
relcache.c sometimes resets rd_amcache without also clearing the
rd_indexcxt, so that would probably lead to leakage.)
I will consider this for sure, thanks.
Attachments:
for_site_spgist_gist_covering_time_by_rows2_lines.pngimage/png; name=for_site_spgist_gist_covering_time_by_rows2_lines.pngDownload
�PNG
IHDR {4�� iCCPkCGColorSpaceGenericRGB 8��U]hU>�sg#$�Sl4�t�?
%
�V4�����6n�I6�"�d������83���OEP|1������ (��>�/�
%�� (>���P�����;3�i���e�|����{��g����X����-2�s���=+�����WQ+]�L6Ow�[�C�{_�������F qb�������U�vz��?�Z�b��1@�/z��c��s>~�if�,���USj������F�1��_�Mj�����b�u���p�a��m�h��m�����>��a\�+5%��Q�K���F��km}������?�������D\���������!~��6�,�-��7��S��������v��5Z��;���[���r�mS�����5��{yD���yH�}r�9��|����-���������FA������Jj�I.��[/�]m���K7�K���R��D��r��Y�Q��O�-����Q��|�|�6���
� (�0��
MXd(@��h��2��_�f��<�:����������_�����*d�>������e���\c?~,7?& ���^2I��q2"y�<M���d���JlE^<7����3R��E�9���`�3*L\S��,��#�)�]���_�\�,7Q����W��_���2�+�j���W��r�Z���L��lXswUm��������q��WF~������]<Yo.F���j�VN�D������,�'}(����}�}�}�}�]�;�����.ps_��j�Z�{y�g��k�J!#lr�6�Qa2�'cBQ�������/�=c���\�.V����M�UUT�p�)VoM8�A�$Cd��6T��W��"�O�RiS;S���A���v�m����n�R��c�}Y�:n�
�wK��b�6*��������L�hS��mZ������2�[.G����?���.��� ����#n���8�������H|�������2x~�����s��-��7;����t�>@�� g���|U\� @ IDATx��x����W�K��{�6�6��0�a��&$�G!$�����.���@�����.5�@z�1�cLq��{�-[V��������]�������<��9g��9�[�_������@ @ @ @ @ &c��x(@ @ @ @ 0 �@ @ @ @ � �/�GC @ @ @ @� @ @ @ @ �a��Xx���ill��Gq�3$$$����u�Hbb���x_�\s�[�y�v$))IZZZ��G����*�� ��w�?Z[[���z��V�;xk�� ��:�Fu�999~�q�W f�555��\�s�E�����=�Kv���|��u ��n���dIII���:ww�����TQQ�>|�(�{V���]���\�g��xg������{��g���KSSS�t:N{���j<��� ����<v�08�_1� � � � � ����� � � � � 1/@ 0�_1� � � � � ����� � � � � 1/@ 0�_1� � � � � ����� � � � � 1/@ 0�_1� � � � � ����� � � � � 1/@ 0�_1� � � � � ����� � � � � 1/@ 0�_1� � � � � ����� � � � � 1/@ 0�_1� � � � � ����� � � � � 1/@ 0�_1� � � � � ����� � � � � 1/@ 0�_1� � � � � ����� � � � � 1/@ 0�_1� � � � � ����� � � � � 1/@ 0�_1� � � � � ����� � � � � 1/@ 0�_1� � � � � ����� � � � � 1/@ 0�_1� � � � � ����� � � � � 1/@ 0�_1� � � � � ����� � � � � 1/��O�"� � � �L���M���G�[[��!C%)�q%!��!@�� v,�"� � � �(p��O�������HO����F����e,�k� bG�_�����I@ @ @ �nx�����?}���zyi��n�'�#� t]� `�
i@ @ �y������OW�<���:�2
@ �%@ �]��� � � � ��nl��%�on�����}}�(@ p� @w�z� � � �J@o�q��%�G :�u�� �n�� ���F��yw��;@ @ @ ��]����{��ili�{� � ��VM���r������~*-��� ��o��C�zE�m�&O?���X�Brsse��Er��W{�s� � � ��oo�W}�FM �pp�@XG �}?������X~�������.�������Djjj�z������j���F���'���{�
^�@ @ ��-GK�����y�[ # m d@
�5 x��A��e�\u�Ur������3�����#G���5k���zJ���7�|��5��w��W�s�='���.$�K � � �@l ����_}�g��oM�&����mje
�� �B�� ���%!!�;�O{�i�:eee�W�\i������c���RQQa�� � � � �\���U~���r�����Y�����%�II�rF Z8� � ���yyy�`�y��g$##�X���G�1c�����
����
���c���#� � � �X����+��oiu����=.�D5�� ���1���T��|�o�d�$�C_�T�����o�~Q� �+��@���!���w�������G>����|���^Po�aN��N����S_H_z�%#?q�D�<y��]HII�{�[���HR���u��H�HLL�����z����������w|��H��sjj*������=�������{��y�<����~Z-��������������+����(OOI��OP?��=�B"=��l()�����������y�_I �@8� �#������ ����V2��z4�u�]'��s��1��o`���777{�����3"�����O��$]��:PAr��~W�/��'�CO �[�_��;�+��+�C�F�{��+�Wt�/�F��
��������Z���C?p�%2�W/oyZ�����j�W�<F�o��C�v��UW����?c�r��+� ��~(z���w�-0�TO�������W_�o�Q���������)X'0��a���m��fd+++E�B'�����B�:-�R@�����'�[@�+=��N-�Mr���===��-�~M���=�K�������m�UF���I�R���3�� ������\�������y�t��3ef�>����������gQ�]�����\�����?�e��(EyF��y�K������bC �C���_/���4��f5c���t*P������c���~�����@ @ @ �-����^��*�gN�����M7���Cb����8l��K���Y5��� DR ����{�!5������zZ���k����(�>}����=���+V��F����@ @ @ ����|��?r���(50�ws�{���O 4k�:���e�}��=���}]��� I�� -Zd|���[Eo��w�^���`l�q���z:��|�}��W�^-O>��\{���)��D�� � � �@,<�z���u��Q��2�p�d��d� ������u�<�f���������C���B"#�5 ���'w�y��u�]���}�xb����~�3c�.(,,46���<������3g�\v�e��� � � �@
|�o���F��S��`��/��y�b��= ���b9o����r��e>�}���2\�qOB � � �~�i��;��u����}������< �f���_~YJ���zj�^8�� � � �}�c�^�����n�}��0�\�s��6�1��8 ��M?nx�i�|w��2w�p3� �@D"Y��}J:(��o�@U8� � � �A�5����,
�+/;V��8�R��� �;;]�ez��/yC�%�����C��3f��8F ".� `��� � � � q&��F����R).+�<���}��������� ��@=R�Am`Y�>����&���d��}��yyr������@ 7 t���/ � � �@7
���������;ee���.�����{�� ����r����&�S��;Q��t��V�Ty�
��)�������N����(&��7�x@m�����^d@�%�}�wIg� � � �@����!���'��S����A�`S{@=�������;��5��4Y�Z$35E&��m�t@����w:�/(w�����'����i����������
��z �@� �Z��@ @ @ � lW�0��l�O�~9�<����v$� ���������M���
�q��-,�y�G����\ �1�< � � �����z���W��&K
=2���GGSJ����M�-�&>���������������c�| �� ��H�@ @ @ 'v�}]�UVX��0@n:���`3�� L��m7���6����#E2R��dIS�2s^�N�=Q�S�[���e'�?Q��2mim���{KVjj��O{ �@H ����@ @ @ ��^�T���S���\���$������z�@)I��� ����G<�9]� �8#(�d���� �% w"���������I��:�7''�i�!� � ���Ag@ @ @ ����y�<�v��1D�����##�R���= ��v��$�1��
�h�{^`������%%)�S�� � a &hn� � � �K`������,�������Q��>�)��M@��H���q�z���<[)Y@ �p
XWo
��� � � �@�Jkj���^s`N���O�!�G�����@�����Z��JI��9��22 � � ~s�� � � t������krD�i�����g��:}l�l4n=j
�*(��� @ � #���@ @ @ �� ���wdM�!K{�{����-Y �#��]�nly2 � � �i:.D @ @ �#�������6Z:���&�p�d��Z���� [���U}���6� `W��� ������@ @ @ �����O���}�����{,��=zX���� u{z���:�lh�4?��2 �@�)y�� � � �@`_y������G���Of�%g
l.
�qjb�O;: h_�OW�_�S�@ ��00���@ @ �@]S�|�����:K{�G����L���*c�X��w�� <(//�S�C��A �Q�����yf@ @ � �[�Z&JJ,�rZ�^�������2�o
�}����P���@ � @ �x\� � � DJ�?�D�(�n�}AF�<��BIK���^� Y���j� � Q~n� � � t\�������[.LIL��T��Ov��<�� `���cOE��V�� ��A�� �H�so@ @ @������,Y"�-?D���sej�~l����� �TZ*��MH�q[�@ �K� `w��. � � b=���_}Yj�--_9i���i�-e���#
�i�a�:��))�?'�^�< �@�|�rG�#�@ @ @���aw��7dO�u�����-j�_�����G[n?Zm ���`)#� 9����� � � �@���Cyo�K������W�*N�r-C�q�WqY���Y���A�� �H��� � � �#���m���U�Z�j��.\,�YY��pd�� ��� �[� �@� ���& � � v�M�G���/���o��+��z����� ��SO&!� �G� �{�=A @ @ ,eu�r���H}s��������Q�,e���$&�������9� ^����n � � �@PM--���_�C����g�M?n�y��,��@# ��IVjj����@ ��)@ @ @ R�}�]���A����!w�_(��a7P ���,�� �
��x
t@ @ 8%���
��
N����T���INZ��<��P�� �@� v��+@ @ @ ���:tP~��;�vT��dh����He FJ��"� � �97�B @ @ B.PR]%7���4��Z�������!C-e�� �p$_
�F :�P� � � �W�A��{�
�����x����w�uSJIr��dfJ���uSW� ��p��
� � �@Xn]��l<r�r�1��r��s-en��$%9vc����
J;F! � ��/ �� � � �xh���������HO�-�5��m��`�$!� �O� ��� =B @ @ �8�p�����,O��F��w�"�������z @�} ��N� � � �����r���7������??�9�� K��2��� t�[�/ ��)��,8B @ @ �&P��(����T64X�y�����&Z���IIt^pdA���J@ � @� � � � ahS#����%��x�������[�9�R�������y�������'@ � ��_ @ @ @ ���X����r�^YY��N�5KEd��8���.x5tp ��B! � � �=K���?�������T0�S pLaQ4t�>"� q)@ 0._;� � � ��v���l�R�[�r�y2�wo�r�8 G��[_�B X�� � � �@8�����j����f����<E�8f��������.�.,�)� @�� t�{� � � ������7^�U���<s�@����,e���`�K7Tn@n��� ��{ ��]�@ @ �Q�����<���t����.�����oY~F��Y�$A}�@ �)}�i��H�@ @ @ G�m�$O�[k9���bl����n)���Mg��t��Q�,y2 � �HvWw�
� � ;kKJ������@w��/#
|���`lQ�|��k��������-]�� �@\
����C#� � �@w����^U�Z�f\?��;lxw��������+'N���p@ ��0����� � � X[����*:hN�&�;}���c@ �]� `�s@ @ �7=�w���c����;��g��
@ �p �2�@ @ @ �����5�������KK�?_x�d��Z�� � ��P� � � ��'w}���Y������<K9@ �%@ 0\��@ @ bZ`e������������y�d��A�22 � �S� `8�� � � ��@mS�\��+RQ_oy��G���&O���A @ � �-��@ @ @ ������-[*���<��^��W��g)#� �@$ FB�{"� � �@���g���;vX��03SXt��&%[�� � ��P�� � � �w��V~ly���D����Wv��� � � )y�� � � Q-���L���%>�p��sdr��>� � )����� � � ��
j���Eo�aNWN�$��;�\�1 � �W@@ @ @ �Z��?^������t{F���/����A @�
��� � � D��=}(��k�o�����$Y��GB @�m|wr��? � � �Z���n�GV���/#9Y�t�b���a)'� ��[ ��M�@ @ p���#G���o������dLa�O9 � �E� �[��@ @ @ �
���^{EZZ,}����e����22 � �M� ����A @ @ W 4���_UJ��-�:g������22 � �Q� ��
}B @ @ ����we��C����S���@,�d@ p� @7��� � � �xz�:yn�K_rRS��j����4K9@ �*@ ��o�~!� � �@D>;x@~��{�>��~w/X(Cz����A @�� ��v� � � DD�`U����k���j���f�%����A @�� ���� � � �U���F���?�x}����F��k�N���A @ �����@ @ �n���k���-o-���6����o��k)#� �@� ��7E?@ @ @�[��j��6�3���gOz��?-Z,�����nC@ :���^"� � �@�69"��[#�m�.M�-~[���3�oN����@ @�� ���� � � �L@o��l�yb�Yu�P�����K��8��zT@ @�� ��v� � � �D���N���Q�Z�N�Tls@N��=d��7t��48`]N"�
��-�G@ @ �����G��~/o�*�j��@�����M�l�U� � Q%���RT���������p E��������i7ddd����h���x�Cbb������x�p��'���effJ��B���|v��Y���SRR���
���8�i4��jQ�|���|��S�x���o1C���|��f��2��0`]���������U=7��IIIF���{vv���tJ &F �ot�'�S���E9j����*��9����:5���L�����I��9�������:P���|t��h�|�jO�=����D���[�y'�z�����Q,/m�"�JJ���6��I?���^���q�InZ�Q7Z�n���K�z������������NnlA 0�O��!����M@ @ ��;�_[�����?@�>i�Z�o�0��].* � �� �y�< � � �*P���y|�����F�.=F��7IFE�4_�� @ � U@ @ @�������:,m�W�e�� �K���j�, @ �x �o��F @ @ ^W�}��e��I������-W��(Ij]W � �.@ 0���� � � D��A�i�����{��������C,�d@ �g~�o�gG @ @ J�����\"Uj�?s�a���� #� � �� @ @ �N���}*�<h���~�����[�� � � @� � � �@� �-)�?������T�s�����@ @�*�@�9@ @ p�@�����K�H�m��_�w����qq�� � � 9{�� � � ��{����
�U��+G����A @ �S OYp� � � .xm�6yq�K����g�c)#� � V�Vr � � �B��UU����[z���(w�_(Yj�? � � ���3 � � ����V��Z��J��gN��8C&��m.�@ :�P� � � ��������Z:tz��r����22 � �,@ ���R@ @ �&�U*��b�^�#��KkK�_>Yi����&w��/� �r2 � �,��\L) � � Z�������%�ee����k��;d��+�%�eTA�������FM��IM�mi��O�_�9O�d��*�@
��I@ @ �����ryx�*���M�l������i|x����"=�3���Y���K����s��q�`�HOu>#� �@ �@�
� � t\`S�y���e��bi�����ZMS���48/O~v�9N�(C @ � �p
@ @ :.�r�>�?��H���t�����F�@ @�c ;�Em@ @ p�#����C�>�
G�8�8U4F��7�w��y�Q���=*{+*�uO��dd�������@ @ (�A1Q @ @ �[Z���[�����r�*F���w�Z���i�dLa�O�Z5�w��c���a���'R�6 IJL���T��������k(@ @ �� �D-@ @ 0 �z���A[�Z�����XS���2�q�5S����<�ISNO����������Lg8D @ ��
�� �#� � Geu��������uR�����sRS�� ��I�� 3�o=N � �@� �~c�� � �@�����GV��m�(
j���T��e��?A�T�� � � �w@@ @ p��V�A���>���o����4�G����r�������@ @�= ��.� � � ��x�^�������{�i|�^r���2w�p�}�@ @�} ��N� � � A���MY��X���W��/��
7:������v���.k�lf��A���3X�� � y��� @ @ �N <�a�<�e���[�Z&;����g����Doy{MjM�W�m��V}.;��[]��[0b��W��o=N � ��� ��}�@ @ �x�n���^%�i�z4���}|��j���7��kVKIu����8%1I.;V�Qk����H � ] ��}�[@ @ �������m���W�{V�>i��8s�d���O���:yb�ZyJ}T44X��3�j����|��4Y���$@ �N�����5 � �@��������_���>�v���s��v�9{�9PYi������R����
33��3��KG����4�(G @ �( %/�n"� � `X�{�j�s���Jkj�Er��J���K��ze���j9k���S�|����c��GO�R��@ @ � F�;� @ @ �P`�������������}$�lXo9�3GU��_[T$�N�fl������ !� �@, ����� � � ��@uc���������7������G��_,_.;��Y��3g� ��6]f
d?E@ bH� `�L@ @ >���[�Z��yu �����'������&�d����Y�����2�woo � � c���d � � .����eiq������,�X�;���;�-��),��y����$����c�Z���K��$5�W����/�q��Y�� � � c��x � � ���;o��O��W���^����%�u��z�����\3o�K���@��{��RVW'z���$�`��@ �x��<�e�@ @����i����K>��{�H�
���a�����=����?H � ��o�N<5 � � ]�NM���G�5��X���,�a��wp�2�����8� � q,@ 0�_>�� � ^�!={��p��#>e�
�T�����,���3��R� � �� ��|�<4 � �@$j��|n����>e�
>=p@*,���g�H@ �V� `��z@ @ �U�������� ��i���Ov�L�����~��@ �h mo��"� � D��S p������3��:o��n"����6J@ ��q��yH@ @ 7T64�t�B�
�TZ�Sn/XSrH���Z��
����2@ �[�q��yp@ @ �pT5Z��������9��yi�u�o���������@ @ < =|F @ @��� ����w���e;wX�=o�0IJ��y
@ p�'G
@ @ �����;�45Jy}�'��yS�9XUe)g���� � �S � � �J����_���d��_�}������JI����p� � � ��� � � �N �@}���o�{�7m�s����$��9� � � �58F @ @���v 6�j���RVg��W��u���8^f�*������ � � � �� � � !�� ���kmk��������o������Q�h@ �A� `<�e�@ @ �U�5 {ge����,�z�x�%�3o�v���������S�@ @�� @2�#� � D����W����9�<$�**"�,�
�>����#GZ�>;p@Jkj�ez���G�x��`����<@ @�=�� q@ @ ���w}��P��*�v�3OI�m^��>87-U��) ����������wNNL�9C����@ h_� `�F�@ @ �2�7�oL����&�[�Q�N# 33���<�4>����I��g���j� @ �� ��hQ@ @ *t`���&�L�5 sSO����W:$����Xm��:t���yL��x�A @ �� �D-@ @ � ��y�\������������jmfta��@��T�����i����%&�'�,)�.o��i��k�?������� � �"@ 0V�$�� � D��k�7���Z&������%r��KSKK�{�����fqY�OY8�k �M@t����3��oW��[�7�o_q�h�D@ p ��B � �@�<�v�5���:c�sY(�����B�t��� �k�]0r��]�x�>K���#,y2 � �@� ��� � �@�
��k�5��J���.�kn@�����=�"���XdG V6X�;{F ���US{����b[�P�!!� � �������@ @ � N8�R3-)I�B ���Kd�~��9}�]���E#��m���C�9`����-*����~�s@ $@ 0��@ @ B+`�f������z�lTk
����d?�������e���������OX��$z��9�G ���#G�O[��1���A@ :&@ �c^�F @ hG�x]�\��+�N��`�=���bz�n(�u?]k��e���EmP�����_
W:$�|�_�:������1���z���4`]����p�O@ �� ���q � ����ew�#��];��~�y�yYu���fK�o�O_�TZS#G���6W|,p p���k�������T����DNZ��L�4�R�3C�z���|�r
@ @ �` +E=@ @ �v�� �}���[7{�km�N��'�v��^�� ����o�R�X�h��C�X|r��@Mu�i`Nj�������S6���� � �1���6 � W:�V�0z�����>�/-)�[�>X��������M~`��N��C��!�F�; '&$HV�u��nJ�>>��S��@ @ �� ���"� � 1*�w�����e��c�_�?Yn�����N��5�f�v�u��7�������f��:^����{�k�9]\�� ��^�t*��]�����WY�&� �S����23��t�k�N��9@ @ :#p������k@ @ bR����4���<���T���7%O��k/%����M��ZMc�6�e �@;��6�����
����{���?v��������'I��OQ���9���[�X�V>���������BS�OWJV�����o���K�Z{����$=��} )@ @ � ��D���� � �����*���u���x� ��?��S�@��w
-��{u%c ��/����y���l ��k��?r�r�YYw�*��������`�g�T����o�$O�['�*+�����bjR�7o>�����0�\�1 � �%� w���@ @ ��]X��`}sr|��)0��3�<�64�n����Cz���=zxne|6o��������n��^p���?�r�Sj4���jK�'�j����?&g=�W�����R�(�_�7W���$@ @ � �2�@ @ �L���lAF���i�$��?�z[8� WUZ�8 7���N��'�7yk�N�Tg]��S�J�����3�����9�e���"����r��s�e��7��#O��Sp�
��>B�:T���/���-m�A @ �S� `w��6 � �>�x���9>�9 ��6���n��_�Z�LI�7w����3����u������I�������������^��M��SE=��-�=�>���������N����Oe
@ @ �n �M�4� � ���yoG��|���Pm����%�)�:PW`�hc�
��.�Q,���<����CO��j�9 ������r���F�u�K��:<�=����+WN�$��
�?k�y��@ �n��-L� � �@
�G�������3���_���Z�TR]%�`�n�n�O�M*��K��y��w���m���EE�p�Hy�x���+�����M�����?�~�-�z��__$�j)'� � DR�M@"���@ @ �
�~�� ^�.���u�A�V���:�F�N��w��O�,��Z?5x��$��~����+�����t�
��'��~��3fZ�?���7�t����{�� #�H��BB@ \ ���:E@ @ �%�j��#P�����M@� 77j2�9=%��G���&�z=��%+5�(����8^�������c}P��%_?�R����S.=F^0������(�T1>�$&�"�@ @ � Dt��o��@��7��@ @ t�@^W~.3� lQ;{��`��_�C�O�c�i����|�m-�E#GIj������w�5�-G������N�^��k1!� � ���SN7�O�������5E���Lf��%�\r�L�2�{�m����O?-+V��\����E�������� @ @�{�,WK
&