Is it safe to cache data by GiST consistent function
Hello,
When implementing a GiST consistent function I found the need to cache pre-processed query across invocations.
I am not sure if it is safe to do (or I need to perform some steps to make sure cached info is not leaked between rescans).
The comment in gistrescan says:
/*
* If this isn't the first time through, preserve the fn_extra
* pointers, so that if the consistentFns are using them to cache
* data, that data is not leaked across a rescan.
*/
which seems to me self-contradictory as fn_extra is preserved between rescans (so leaks are indeed possible).
Am I missing something?
Thanks,
Michal
=?utf-8?Q?Micha=C5=82_K=C5=82eczek?= <michal@kleczek.org> writes:
When implementing a GiST consistent function I found the need to cache pre-processed query across invocations.
I am not sure if it is safe to do (or I need to perform some steps to make sure cached info is not leaked between rescans).
AFAIK it works. I don't see any of the in-core ones doing so,
but at least range_gist_consistent and multirange_gist_consistent
are missing a bet by repeating their cache search every time.
The comment in gistrescan says:
/*
* If this isn't the first time through, preserve the fn_extra
* pointers, so that if the consistentFns are using them to cache
* data, that data is not leaked across a rescan.
*/
which seems to me self-contradictory as fn_extra is preserved between rescans (so leaks are indeed possible).
I think you're reading it wrong. If we cleared fn_extra during
rescan, access to the old extra value would be lost so a new one
would have to be created, leaking the old value for the rest of
the query.
regards, tom lane
Thanks for taking your time to answer. Not sure if I understand though.
On 3 Apr 2024, at 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
=?utf-8?Q?Micha=C5=82_K=C5=82eczek?= <michal@kleczek.org> writes:
When implementing a GiST consistent function I found the need to cache pre-processed query across invocations.
I am not sure if it is safe to do (or I need to perform some steps to make sure cached info is not leaked between rescans).AFAIK it works. I don't see any of the in-core ones doing so,
but at least range_gist_consistent and multirange_gist_consistent
are missing a bet by repeating their cache search every time.
pg_trgm consistent caches tigrams but it has some logic to make sure cached values are recalculated:
cache = (gtrgm_consistent_cache *) fcinfo->flinfo->fn_extra;
if (cache == NULL ||
cache->strategy != strategy ||
VARSIZE(cache->query) != querysize ||
memcmp((char *) cache->query, (char *) query, querysize) != 0)
What I don’t understand is if it is necessary or it is enough to check fn_extra==NULL.
The comment in gistrescan says:
/*
* If this isn't the first time through, preserve the fn_extra
* pointers, so that if the consistentFns are using them to cache
* data, that data is not leaked across a rescan.
*/which seems to me self-contradictory as fn_extra is preserved between rescans (so leaks are indeed possible).
I think you're reading it wrong. If we cleared fn_extra during
rescan, access to the old extra value would be lost so a new one
would have to be created, leaking the old value for the rest of
the query.
I understand that but not sure what “that data is not leaked across a rescan” means.
—
Michal
=?utf-8?Q?Micha=C5=82_K=C5=82eczek?= <michal@kleczek.org> writes:
On 3 Apr 2024, at 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
AFAIK it works. I don't see any of the in-core ones doing so,
but at least range_gist_consistent and multirange_gist_consistent
are missing a bet by repeating their cache search every time.
pg_trgm consistent caches tigrams but it has some logic to make sure cached values are recalculated:
cache = (gtrgm_consistent_cache *) fcinfo->flinfo->fn_extra;
if (cache == NULL ||
cache->strategy != strategy ||
VARSIZE(cache->query) != querysize ||
memcmp((char *) cache->query, (char *) query, querysize) != 0)
What I don’t understand is if it is necessary or it is enough to check fn_extra==NULL.
Ah, I didn't think to search contrib. Yes, you need to validate the
cache entry. In this example, a rescan could insert a new query
value. In general, an opclass support function could get called using
a pretty long-lived FunctionCallInfo (e.g. one in the index's relcache
entry), so it's unwise to assume that cached data is relevant to the
current call without checking.
regards, tom lane
On 3 Apr 2024, at 19:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
=?utf-8?Q?Micha=C5=82_K=C5=82eczek?= <michal@kleczek.org> writes:
pg_trgm consistent caches tigrams but it has some logic to make sure cached values are recalculated:
cache = (gtrgm_consistent_cache *) fcinfo->flinfo->fn_extra;
if (cache == NULL ||
cache->strategy != strategy ||
VARSIZE(cache->query) != querysize ||
memcmp((char *) cache->query, (char *) query, querysize) != 0)What I don’t understand is if it is necessary or it is enough to check fn_extra==NULL.
Ah, I didn't think to search contrib. Yes, you need to validate the
cache entry. In this example, a rescan could insert a new query
value. In general, an opclass support function could get called using
a pretty long-lived FunctionCallInfo (e.g. one in the index's relcache
entry), so it's unwise to assume that cached data is relevant to the
current call without checking.
This actually sounds scary - looks like there is no way to perform cache clean-up after rescan then?
Do you think it might be useful to introduce a way for per-rescan caching (ie. setting up a dedicated memory context in gistrescan and passing it to support functions)?
—
Michal