Gist consistent and compression
Hello list,
An idea for possible improvement of query speed on gist indexes with a
costly compression function. We have a gist index that uses a compressed
internal datatype. The compression is with some cost (it involves a
syscache lookup). The functions called by the consistent functions are
binary functions on the internal type. Since the query parameter passed
to consistent is the 'external' datatype, it must be compressed by the
consistent function. When performing a query, the same data value (the
query) is compressed repeatedly while traversing the nodes in the index
tree. The query speed increased substantially after building in an ad
hoc caching (a hash) in the compression function.
We believe that every gist index that uses compression could benefit
from a cache, even a small one. So it would make sense to add the
caching to the gist support in the backend, instead of having to build
one in the user defined functions gist functions for every datatype.
Any thoughts?
regards,
Yeb Havinga
Hi,
Le 1 mai 09 à 23:04, Yeb Havinga a écrit :
An idea for possible improvement of query speed on gist indexes with
a costly compression function. We have a gist index that uses a
compressed internal datatype. The compression is with some cost (it
involves a syscache lookup).
[...]
small one. So it would make sense to add the caching to the gist
support in the backend, instead of having to build one in the user
defined functions gist functions for every datatype.Any thoughts?
It seems to me that what you're asking for is addressed indirectly in
the possibility to make your internal data type a full SQL visible
datatype. Then you store this new datatype directly in the table and
index that. Instead of converting from external to internal type at
consistent() time in a query, provide an implicit CAST for external to
internal for queries to "just work" without editing. The CAST will get
called once per literal in the query.
See prefix and the prefix_range datatype as an example of this:
http://blog.tapoueh.org/prefix.html
I'm not sure if the caching facility should be added to -core
PostgreSQL or not, I'm just proposing another way to solve your
problem without requiring it...
Regards,
--
dim
Hi Dimitri, list
It seems to me that what you're asking for is addressed indirectly in
the possibility to make your internal data type a full SQL visible
datatype. Then you store this new datatype directly in the table and
index that. Instead of converting from external to internal type at
consistent() time in a query, provide an implicit CAST for external to
internal for queries to "just work" without editing. The CAST will get
called once per literal in the query.
Thanks for your reply. If there would be a way that the user never sees
the 'internal' type in query results (with some rules perhaps), it might
work. But somehow it feels 'not right'; the user datatype is the
external one, and how the gist index works internally should not be put
as burden on the user. The compression is also lossy, so recheck is
necessary and I think that will not work if only the internal type is
stored in the relation.
See prefix and the prefix_range datatype as an example of this:
http://blog.tapoueh.org/prefix.html
I wish I'd seen the slides on that link half a year ago, since I think
they are helpful to anyone that wants to start writing a gist index. I
remember using google translate on a russion page about gist in the
search for information. Besides that also reading Guttmans r-tree paper
was good, to have any idea what picksplit, penalty and union are
supposed to do.
regards,
Yeb Havinga