Unusable SP-GiST index
While trying to find a case where spgist wins over btree for text, I
came across the following behavior which I would consider a bug:
CREATE TABLE texts (value text);
INSERT INTO texts SELECT repeat('a', (2^20)::integer);
CREATE INDEX ON texts USING spgist (value);
SET enable_seqscan = off;
TABLE texts;
That produces:
ERROR: index row requires 12024 bytes, maximum size is 8191
It seems to me the index should not be allowed to be created if it won't
be usable.
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Vik Fearing <vik@2ndquadrant.fr> writes:
While trying to find a case where spgist wins over btree for text, I
came across the following behavior which I would consider a bug:
CREATE TABLE texts (value text);
INSERT INTO texts SELECT repeat('a', (2^20)::integer);
CREATE INDEX ON texts USING spgist (value);
SET enable_seqscan = off;
TABLE texts;
That produces:
ERROR: index row requires 12024 bytes, maximum size is 8191
Hmm ... it's not really SP-GiST's fault. This query is trying to do
an index-only scan, and the API defined for that requires the index
to hand back an IndexTuple, which is of (very) limited size.
SP-GiST is capable of dealing with values much larger than one page,
but there's no way to hand them back through that API.
Maybe we should redefine the API as involving a TupleTableSlot that
the AM is supposed to fill --- basically, moving StoreIndexTuple
out of the common code in nodeIndexonlyscan.c and requiring the AM
to do that work. The possible breakage of third-party code is a
bit annoying, but there can't be all that many third-party AMs
out there yet.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
Maybe we should redefine the API as involving a TupleTableSlot that
the AM is supposed to fill --- basically, moving StoreIndexTuple
out of the common code in nodeIndexonlyscan.c and requiring the AM
to do that work. The possible breakage of third-party code is a
bit annoying, but there can't be all that many third-party AMs
out there yet.
After looking a bit at gist and sp-gist, neither of them would find that
terribly convenient; they really want to create one blob of memory per
index entry so as to not complicate storage management too much. But
they'd be fine with making that blob be a HeapTuple not IndexTuple.
So maybe the right approach is to expand the existing API to allow the
AM to return *either* a heap or index tuple; that could be made to not
be an API break.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I wrote:
After looking a bit at gist and sp-gist, neither of them would find that
terribly convenient; they really want to create one blob of memory per
index entry so as to not complicate storage management too much. But
they'd be fine with making that blob be a HeapTuple not IndexTuple.
So maybe the right approach is to expand the existing API to allow the
AM to return *either* a heap or index tuple; that could be made to not
be an API break.
Here's a draft patch along those lines. With this approach, btree doesn't
need to be touched at all, since what it's returning certainly is an
IndexTuple anyway. I fixed both SPGIST and GIST to use HeapTuple return
format. It's not very clear to me whether GIST has a similar hazard with
very large return values, but it might, and it's simple enough to change.
regards, tom lane