Incorrect cursor behaviour with gist index
I'm back, sorry for a long absence.
About this bug: http://archives.postgresql.org/pgsql-bugs/2008-09/msg00086.php
Unfortunately, GiST index doesn't work with change direction of scan. I.e. it
can't move forward then move backward and this behaviour was from the beginning.
I think it's fixable, although GiST doesn't store on page left links (only right
links) and doesn't store parent page pointer. That's needed because matched
entries in GiST isn't stored consecutively unlike to BTree. So, price for it
will be storing in memory or stack all numbers of visited pages in index even
they will not be used. In worst case it's 2^32 of GISTSearchStack. Right now it
occupies 24-32 bytes depending on sizeof(void*), and it will be needed to add
one more pointer to make double-linked stack. So, about 32Mb. Is it acceptable?
Nevertheless, it doesn't solve problem with concurrent page splitting what can
cause different order between forward and backward scan in one cursor.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Teodor Sigaev <teodor@sigaev.ru> writes:
I'm back, sorry for a long absence.
About this bug: http://archives.postgresql.org/pgsql-bugs/2008-09/msg00086.php
Unfortunately, GiST index doesn't work with change direction of scan. I.e. it
can't move forward then move backward and this behaviour was from the beginning.
I think it's fixable, although GiST doesn't store on page left links (only right
links) and doesn't store parent page pointer. That's needed because matched
entries in GiST isn't stored consecutively unlike to BTree. So, price for it
will be storing in memory or stack all numbers of visited pages in index even
they will not be used. In worst case it's 2^32 of GISTSearchStack. Right now it
occupies 24-32 bytes depending on sizeof(void*), and it will be needed to add
one more pointer to make double-linked stack. So, about 32Mb. Is it acceptable?
Nevertheless, it doesn't solve problem with concurrent page splitting what can
cause different order between forward and backward scan in one cursor.
Seems like a lotta work for a partial solution :-(. Probably the path
of least resistance is to teach the planner that only some index AMs can
do backwards scan. That would result in a Materialize buffer getting
placed in front of the query if the user demanded scroll capability,
but it would cost nothing in the more typical case where backwards scan
isn't needed.
It should be sufficient to specify this in pg_am, right? Or could the
opclass or indexkey details affect it?
BTW, can GIN do backwards scan?
regards, tom lane
Seems like a lotta work for a partial solution :-(. Probably the path
of least resistance is to teach the planner that only some index AMs can
do backwards scan. That would result in a Materialize buffer getting
placed in front of the query if the user demanded scroll capability,
but it would cost nothing in the more typical case where backwards scan
isn't needed.
Probably, it will be a better solution. In this case GiST code could be
simplified - remove support of backward scan (in any case not fully workable)
It should be sufficient to specify this in pg_am, right? Or could the
opclass or indexkey details affect it?
I don't see any examples where it depends on opclass or index key, that's is a
AM property.
BTW, can GIN do backwards scan?
No, at all.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Teodor Sigaev <teodor@sigaev.ru> writes:
Seems like a lotta work for a partial solution :-(. Probably the path
of least resistance is to teach the planner that only some index AMs can
do backwards scan. That would result in a Materialize buffer getting
placed in front of the query if the user demanded scroll capability,
but it would cost nothing in the more typical case where backwards scan
isn't needed.
Probably, it will be a better solution. In this case GiST code could be
simplified - remove support of backward scan (in any case not fully workable)
Okay. I'll go fix the core code, and you can take out whatever you want
in GiST/GIN.
BTW, I think that the support for ammarkpos/amrestrpos in these two AMs
is pretty much useless code as well. We only use mark/restore in merge
joins, and so it only needs to be supported by plan nodes that can
deliver sorted output, which GiST/GIN indexscans can't. Furthermore,
even if we had another use for the facility, it'd be pretty questionable
to use it when the AM can't guarantee to return the same sequence of
tuples after backing up. So I think it would be sufficient to have
gistmarkpos et al throw error if called.
regards, tom lane
to use it when the AM can't guarantee to return the same sequence of
tuples after backing up. So I think it would be sufficient to have
gistmarkpos et al throw error if called.
Why not to remove gistrestrpos/gistmarkpos/ginrestrpos/ginmarkpos from pg_am table?
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Teodor Sigaev <teodor@sigaev.ru> writes:
to use it when the AM can't guarantee to return the same sequence of
tuples after backing up. So I think it would be sufficient to have
gistmarkpos et al throw error if called.
Why not to remove gistrestrpos/gistmarkpos/ginrestrpos/ginmarkpos from pg_am table?
First, because that would mean adding code to the indexam.c functions to
avoid crashing, and second because then we'd have to force initdb to
change our minds about this. I think having stub functions that throw
errors, rather than no catalog entry at all, is cheap future-proofing.
regards, tom lane