btree_gin and btree_gist for enums
Here is a patch to add enum support to btree_gin and btree_gist. I
didn't include distance operations, as I didn't think it terribly
important, and there isn't a simple way to compute it sanely and
efficiently, so no KNN support.
cheers
andrew
Attachments:
enum-btree-gist-gin.patchtext/x-patch; name=enum-btree-gist-gin.patchDownload+2927-2113
On 03/17/2016 06:44 AM, Andrew Dunstan wrote:
Here is a patch to add enum support to btree_gin and btree_gist. I
didn't include distance operations, as I didn't think it terribly
important, and there isn't a simple way to compute it sanely and
efficiently, so no KNN support.
This time including the data file for the gist regression tests.
cheers
andrew
Attachments:
enum-btree-gist-gin-2.patchtext/x-patch; name=enum-btree-gist-gin-2.patchDownload+3522-2113
On Thu, Mar 17, 2016 at 11:21 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 03/17/2016 06:44 AM, Andrew Dunstan wrote:
Here is a patch to add enum support to btree_gin and btree_gist. I didn't
include distance operations, as I didn't think it terribly important, and
there isn't a simple way to compute it sanely and efficiently, so no KNN
support.This time including the data file for the gist regression tests.
Are you intending this as a 9.7 submission? Because it's pretty late for 9.6.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/18/2016 09:54 AM, Robert Haas wrote:
On Thu, Mar 17, 2016 at 11:21 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 03/17/2016 06:44 AM, Andrew Dunstan wrote:
Here is a patch to add enum support to btree_gin and btree_gist. I didn't
include distance operations, as I didn't think it terribly important, and
there isn't a simple way to compute it sanely and efficiently, so no KNN
support.This time including the data file for the gist regression tests.
Are you intending this as a 9.7 submission? Because it's pretty late for 9.6.
I guess so. I would certainly find it hard to argue that it should be in
9.6 unless there is a consensus on it.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi Andrew, all,
First message here! I didn't get around to sending an intro/"thank you all"
e-mail yet, and a small performance (?) patch+idea(s)... (CPU stuff, since
I don't otherwise know much about PG internals.) Anyway...
----- Original Message -----
From: "Andrew Dunstan"
Sent: Thursday, March 17, 2016
Here is a patch to add enum support to btree_gin and btree_gist. I
didn't include distance operations, as I didn't think it terribly
important, and there isn't a simple way to compute it sanely and
efficiently, so no KNN support.
Major thanks for coming up with this a week after your "enums and indexing"
message. My love of all things Postgres has a lot to do with being able to
do nearly anything one could want (coming from MySQL :-/)! So I was like,
"Wait, what?" when you brought up the subject, since this was something I
hadn't actually tried, but was planning to use btree_gin/gist with a good
mix of stuff, including enums (array and scalar).
At first I thought it was just arrays of enums, until this patch for the
contrib extensions (for the btree parts with GIN/GiST; plus GiST
exclusion?), and your blog posts... [1]http://adpgtech.blogspot.com/2016/03/gist-and-gin-support-for-enums.html[2]http://adpgtech.blogspot.com/2016/03/gin-indexing-array-of-enums.html I wasn't certain from the second
post whether your array solution can be used today without this patch (yes,
just tried 9.5). So, two separate issues, and the patch addresses scalar
stuff, IIUC.
It would be *really* nice to have this in 9.6. It seems it's simply filling
out functionality that should already be there, right? An oversight/bug fix
so it works "as advertised?" :-) Is any other btree type-compatibility
missing from these modules? (I notice the btree_gin docs don't mention
"numeric," but it works.)
I just looked over the patch, and the actual code addition/changes seem
pretty small and straightforward (or am I wrong?). And it's not changing
anything in the core, so...
Well, just wanted to argue my case! :^)
[1]: http://adpgtech.blogspot.com/2016/03/gist-and-gin-support-for-enums.html
[2]: http://adpgtech.blogspot.com/2016/03/gin-indexing-array-of-enums.html
cheers
andrew
Thanks,
Matt
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/24/2016 12:40 PM, Matt Wilmas wrote:
It would be *really* nice to have this in 9.6. It seems it's simply
filling out functionality that should already be there, right? An
oversight/bug fix so it works "as advertised?" :-)
I think that would be stretching the process a bit far. I'm certainly
not prepared to commit it unless there is a general consensus to make an
exception for it.
Is any other btree type-compatibility missing from these modules? (I
notice the btree_gin docs don't mention "numeric," but it works.)
uuid would be another type that should be fairly easily covered but isn't.
I just looked over the patch, and the actual code addition/changes
seem pretty small and straightforward (or am I wrong?).
Yes, sure, it's all fairly simple. Took me a little while to understand
what I was doing, but once I did it was pretty much plain sailing.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 03/24/2016 12:40 PM, Matt Wilmas wrote:
(I notice the btree_gin docs don't mention "numeric," but it works.)
Numeric does work - we have regression tests to prove it, do we should
fix the docs. But I'm also curious to know why apparently we don't have
distance operator support for numeric.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Here is a patch to add enum support to btree_gin and btree_gist. I didn't
include distance operations, as I didn't think it terribly important, and
there isn't a simple way to compute it sanely and efficiently, so no KNN
support.
I don't think we can implement a meaningful distance operator for enums.
This time including the data file for the gist regression tests.
It doesn't apply to HEAD anymore. I have tested it on b12fd41.
The GiST part of it doesn't really work. The current patch compares
oids. We need to change it to compare enumsortorder. I could make it
fail easily:
regression=# create type e as enum ('0', '2', '3');
CREATE TYPE
regression=# alter type e add value '1' after '0';
ALTER TYPE
regression=# create table t as select (i % 4)::text::e from generate_series(0, 100000) as i;
SELECT 100001
regression=# create index on t using gist (e);
SEGFAULT
+ALTER OPERATOR FAMILY gist_enum_ops USING gist ADD
Why don't we just create it with those changes?
+ * Use a similar trick to that used for numeric for enums, since we don't
Do you mean "similar trick that is used" or "trick similar to numeric"?
+ * actually know the leftmost value of any enum without knowing the concrete + * type, so we use a dummy leftmost value of InvalidOid.
+ return DatumGetBool( + DirectFunctionCall2(enum_gt, ObjectIdGetDatum(*((const Oid *) a)), ObjectIdGetDatum(*((const Oid *) b))) + );
Wouldn't it be nice to refactor enum_cmp_internal() to accept typcache
and use it there like the rangetypes?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/04/2016 04:14 PM, Emre Hasegeli wrote:
Here is a patch to add enum support to btree_gin and btree_gist. I didn't
include distance operations, as I didn't think it terribly important, and
there isn't a simple way to compute it sanely and efficiently, so no KNN
support.I don't think we can implement a meaningful distance operator for enums.
Probably.
This time including the data file for the gist regression tests.
It doesn't apply to HEAD anymore. I have tested it on b12fd41.
I'll fix the bitrot..
The GiST part of it doesn't really work. The current patch compares
oids. We need to change it to compare enumsortorder. I could make it
fail easily:
ouch. Ok,I'll work on this.
+ALTER OPERATOR FAMILY gist_enum_ops USING gist ADD
Why don't we just create it with those changes?
I'll take a look.
+ * Use a similar trick to that used for numeric for enums, since we don't
Do you mean "similar trick that is used" or "trick similar to numeric"?
This is perfectly legal English. It means "... a similar trick to the
one used for numeric ... ". I'll change it to that if you think it's
clearer.
+ * actually know the leftmost value of any enum without knowing the concrete + * type, so we use a dummy leftmost value of InvalidOid. + return DatumGetBool( + DirectFunctionCall2(enum_gt, ObjectIdGetDatum(*((const Oid *) a)), ObjectIdGetDatum(*((const Oid *) b))) + );Wouldn't it be nice to refactor enum_cmp_internal() to accept typcache
and use it there like the rangetypes?
Not sure. Will look.
Thanks for the review.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/04/2016 04:14 PM, Emre Hasegeli wrote:
The GiST part of it doesn't really work. The current patch compares
oids. We need to change it to compare enumsortorder. I could make it
fail easily:regression=# create type e as enum ('0', '2', '3');
CREATE TYPE
regression=# alter type e add value '1' after '0';
ALTER TYPE
regression=# create table t as select (i % 4)::text::e from generate_series(0, 100000) as i;
SELECT 100001
regression=# create index on t using gist (e);
SEGFAULT
It calls the enum routines which use the sortorder if necessary. It's
not necessary to use sortorder in the case of evenly numbered enum oids
as we have carefully arranged for them to be directly comparable, and
all the initially allocated oids are even, a very nifty efficiency
measure devised by Tom when we added enum extension.
The real problem here is that enum_cmp_internal assumes that
fcinfo->flinfo has been set up, and DirectFunctionCallN doesn't, it sets
it to NULL.
The patch below cures the problem. I'm not sure if there is a better
way. Thoughts?
cheers
andrew
diff --git a/src/backend/utils/adt/enum.c b/src/backend/utils/adt/enum.c
index 47d5355..64ba7a7 100644
--- a/src/backend/utils/adt/enum.c
+++ b/src/backend/utils/adt/enum.c
@@ -261,7 +261,7 @@ enum_send(PG_FUNCTION_ARGS)
static int
enum_cmp_internal(Oid arg1, Oid arg2, FunctionCallInfo fcinfo)
{
- TypeCacheEntry *tcache;
+ TypeCacheEntry *tcache = NULL;
/* Equal OIDs are equal no matter what */
if (arg1 == arg2)
@@ -277,7 +277,8 @@ enum_cmp_internal(Oid arg1, Oid arg2,
FunctionCallInfo fcinfo)
}
/* Locate the typcache entry for the enum type */
- tcache = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
+ if (fcinfo->flinfo != NULL)
+ tcache = (TypeCacheEntry *) fcinfo->flinfo->fn_extra;
if (tcache == NULL)
{
HeapTuple enum_tup;
@@ -296,7 +297,8 @@ enum_cmp_internal(Oid arg1, Oid arg2,
FunctionCallInfo fcinfo)
ReleaseSysCache(enum_tup);
/* Now locate and remember the typcache entry */
tcache = lookup_type_cache(typeoid, 0);
- fcinfo->flinfo->fn_extra = (void *) tcache;
+ if (fcinfo->flinfo != NULL)
+ fcinfo->flinfo->fn_extra = (void *) tcache;
}
/* The remaining comparison logic is in typcache.c */
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
The real problem here is that enum_cmp_internal assumes that
fcinfo->flinfo has been set up, and DirectFunctionCallN doesn't, it sets
it to NULL.
The patch below cures the problem. I'm not sure if there is a better
way. Thoughts?
That may be a good fix for robustness purposes, but it seems pretty horrid
from an efficiency standpoint. Where is this call, and should we be
modifying it to provide a flinfo?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/05/2016 11:46 AM, Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
The real problem here is that enum_cmp_internal assumes that
fcinfo->flinfo has been set up, and DirectFunctionCallN doesn't, it sets
it to NULL.
The patch below cures the problem. I'm not sure if there is a better
way. Thoughts?That may be a good fix for robustness purposes, but it seems pretty horrid
from an efficiency standpoint. Where is this call, and should we be
modifying it to provide a flinfo?
See attached updated patch that adds enum support to btree_gist and
btree_gin
I thought of providing an flinfo, but I couldn't see a simple way to do
it that would provide something much longer lived than the function
call, in which case it seemed a bit pointless. That's why I asked for
assistance :-)
cheers
andrew
Attachments:
enum-btree-gist-gin-3.patchtext/x-diff; name=enum-btree-gist-gin-3.patchDownload+4246-2113
Andrew Dunstan <andrew@dunslane.net> writes:
On 11/05/2016 11:46 AM, Tom Lane wrote:
That may be a good fix for robustness purposes, but it seems pretty horrid
from an efficiency standpoint. Where is this call, and should we be
modifying it to provide a flinfo?
I thought of providing an flinfo, but I couldn't see a simple way to do
it that would provide something much longer lived than the function
call, in which case it seemed a bit pointless. That's why I asked for
assistance :-)
Hmm. The problem is that the intermediate layer in btree_gist (and
probably btree_gin too, didn't look) does not pass through the
FunctionCallInfo data structure that's provided by the GIST AM.
That wasn't needed up to now, because none of the supported data types
are complex enough that any cached state would be useful, but trying
to extend it to enums exposes the shortcoming.
We could fix this, but it would be pretty invasive since it would require
touching just about every function in btree_gist/gin. Not entirely sure
that it's worth it. On the other hand, the problem might well come up
again in future, perhaps for a datatype where the penalty for lack of
a cache is greater --- eg, it would be pretty painful to support
record_cmp without caching. And the changes ought to be relatively
mechanical, even if they are extensive.
BTW, this patch would be a great deal shorter if you made use of the
work done in 40b449ae8. That is, you no longer need to replace the
base extension script --- just add an update script and change the
default version in the .control file. See fd321a1df for an example.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/05/2016 01:13 PM, Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
On 11/05/2016 11:46 AM, Tom Lane wrote:
That may be a good fix for robustness purposes, but it seems pretty horrid
from an efficiency standpoint. Where is this call, and should we be
modifying it to provide a flinfo?I thought of providing an flinfo, but I couldn't see a simple way to do
it that would provide something much longer lived than the function
call, in which case it seemed a bit pointless. That's why I asked for
assistance :-)Hmm. The problem is that the intermediate layer in btree_gist (and
probably btree_gin too, didn't look) does not pass through the
FunctionCallInfo data structure that's provided by the GIST AM.
That wasn't needed up to now, because none of the supported data types
are complex enough that any cached state would be useful, but trying
to extend it to enums exposes the shortcoming.We could fix this, but it would be pretty invasive since it would require
touching just about every function in btree_gist/gin. Not entirely sure
that it's worth it. On the other hand, the problem might well come up
again in future, perhaps for a datatype where the penalty for lack of
a cache is greater --- eg, it would be pretty painful to support
record_cmp without caching. And the changes ought to be relatively
mechanical, even if they are extensive.
Yeah. I think it's probably worth doing. btree_gin is probably a better
place to start because it's largely macro-ized.
I don't have time right now to do this. I'll try to get to it if nobody
else does over then next couple of months.
BTW, this patch would be a great deal shorter if you made use of the
work done in 40b449ae8. That is, you no longer need to replace the
base extension script --- just add an update script and change the
default version in the .control file. See fd321a1df for an example.
Oh, nice. My work was originally done in March, before this came in.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I won't have time to fix this before the end of the Commitfest
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Nov 23, 2016 at 11:31 AM, Andrew Dunstan <adunstan@postgresql.org>
wrote:
I won't have time to fix this before the end of the Commitfest
The patch is marked as "returned with feedback" in 2016-11 commitfest.
Please free to submit an updated patch to the next commitfest.
Regards,
Hari Babu
Fujitsu Australia
On 11/05/2016 03:11 PM, Andrew Dunstan wrote:
On 11/05/2016 01:13 PM, Tom Lane wrote:
Andrew Dunstan <andrew@dunslane.net> writes:
On 11/05/2016 11:46 AM, Tom Lane wrote:
That may be a good fix for robustness purposes, but it seems pretty
horrid
from an efficiency standpoint. Where is this call, and should we be
modifying it to provide a flinfo?I thought of providing an flinfo, but I couldn't see a simple way to do
it that would provide something much longer lived than the function
call, in which case it seemed a bit pointless. That's why I asked for
assistance :-)Hmm. The problem is that the intermediate layer in btree_gist (and
probably btree_gin too, didn't look) does not pass through the
FunctionCallInfo data structure that's provided by the GIST AM.
That wasn't needed up to now, because none of the supported data types
are complex enough that any cached state would be useful, but trying
to extend it to enums exposes the shortcoming.We could fix this, but it would be pretty invasive since it would
require
touching just about every function in btree_gist/gin. Not entirely sure
that it's worth it. On the other hand, the problem might well come up
again in future, perhaps for a datatype where the penalty for lack of
a cache is greater --- eg, it would be pretty painful to support
record_cmp without caching. And the changes ought to be relatively
mechanical, even if they are extensive.Yeah. I think it's probably worth doing. btree_gin is probably a
better place to start because it's largely macro-ized.
So looking at this we have:
static Datum
gin_btree_compare_prefix(FunctionCallInfo fcinfo)
{
Datum a = PG_GETARG_DATUM(0);
Datum b = PG_GETARG_DATUM(1);
QueryInfo *data = (QueryInfo *) PG_GETARG_POINTER(3);
int32 res,
cmp;
cmp = DatumGetInt32(DirectFunctionCall2Coll(
data->typecmp,
PG_GET_COLLATION(),
(data->strategy ==
BTLessStrategyNumber ||
data->strategy ==
BTLessEqualStrategyNumber)
? data->datum : a,
b));
and then the referred to routine in the enum case looks like this:
Datum
gin_enum_cmp(PG_FUNCTION_ARGS)
{
Oid a = PG_GETARG_OID(0);
Oid b = PG_GETARG_OID(1);
int res = 0;
if (ENUM_IS_LEFTMOST(a))
{
res = (ENUM_IS_LEFTMOST(b)) ? 0 : -1;
}
else if (ENUM_IS_LEFTMOST(b))
{
res = 1;
}
else
{
res = DatumGetInt32(DirectFunctionCall2(enum_cmp,
ObjectIdGetDatum(a),
ObjectIdGetDatum(b)));
}
PG_RETURN_INT32(res);
}
I'm not entirely sure how I should replace those DirectFunctionCall2 calls.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
I'm not entirely sure how I should replace those DirectFunctionCall2 calls.
Basically what we want is for the called function to be able to use the
fcinfo->flinfo->fn_extra and fcinfo->flinfo->fn_mcxt fields of the
FmgrInfo struct that GIN has set up for the btree_gin support function.
The fast but somewhat scary way to do it would just be to pass through
the flinfo pointer as-is. Imagine that fmgr.c grows a set of functions
like
Datum
DontKnowWhatToCallThisFunctionCall2(PGFunction func,
FmgrInfo *flinfo, Oid collation,
Datum arg1, Datum arg2)
{
FunctionCallInfoData fcinfo;
Datum result;
InitFunctionCallInfoData(fcinfo, flinfo, 2, collation, NULL, NULL);
fcinfo.arg[0] = arg1;
fcinfo.arg[1] = arg2;
fcinfo.argnull[0] = false;
fcinfo.argnull[1] = false;
result = (*func) (&fcinfo);
/* Check for null result, since caller is clearly not expecting one */
if (fcinfo.isnull)
elog(ERROR, "function %p returned NULL", (void *) func);
return result;
}
and then you'd just pass through the fcinfo->flinfo you got.
The reason this is kind of scary is that it's just blithely assuming
that the function won't look at the *other* fields of the FmgrInfo.
If it did, it would likely get very confused, since those fields
would be describing the GIN support function, not the function we're
calling.
We could alternatively have this trampoline function set up a fresh, local
FmgrInfo struct that it zeroes except for copying fn_extra and fn_mcxt
from the caller's struct, and then it copies fn_extra back again on the
way out. That's a few more cycles but it would be safer, I think; if the
function tried to look at the other fields such as fn_oid it would see
obviously bogus data.
BTW, while I'm looking at this ... isn't gin_enum_cmp broken on its face?
It's using DirectFunctionCall2 to call enum_cmp, and that's wrong because
DirectFunctionCall2 does not pass through a flinfo but enum_cmp needs to
have one. I've not tested, but I'm certain that this would dump core if
asked to compare odd-numbered enum OIDs.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 02/23/2017 04:41 PM, Tom Lane wrote:
BTW, while I'm looking at this ... isn't gin_enum_cmp broken on its face?
It's using DirectFunctionCall2 to call enum_cmp, and that's wrong because
DirectFunctionCall2 does not pass through a flinfo but enum_cmp needs to
have one. I've not tested, but I'm certain that this would dump core if
asked to compare odd-numbered enum OIDs.
Yes, that's what I'm trying to fix.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 02/23/2017 04:41 PM, Tom Lane wrote:
Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
I'm not entirely sure how I should replace those DirectFunctionCall2 calls.
Basically what we want is for the called function to be able to use the
fcinfo->flinfo->fn_extra and fcinfo->flinfo->fn_mcxt fields of the
FmgrInfo struct that GIN has set up for the btree_gin support function.The fast but somewhat scary way to do it would just be to pass through
the flinfo pointer as-is. Imagine that fmgr.c grows a set of functions
like[...]
Here's a POC for btree_gin based on my original patch. I just made your
function static in btree_gin.c at least for now. I stayed with the
DirectFunctionCall2 use in the type-specific routines where calling
context wasn't needed (i.e. everything except enums). But it looks more
than ugly and highly invasive for btree_gist, and sadly that's my main
use case, exclusion constraints with enum fields.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services