GiST support for UUIDs

Started by Paul Jungwirthalmost 11 years ago13 messageshackers
Jump to latest
#1Paul Jungwirth
pj@illuminatedcomputing.com

Hello,

I'm interested in adding GiST support for the UUID column type from
the uuid-ossp extension. This has been requested and attempted before:

http://dba.stackexchange.com/questions/83604/optimizing-postgres-row-overlap-constraints-involving-uuids-and-gist
/messages/by-id/CAH3i69njJ9AX1fzHwt6uoUzCMBqnaDBwhmAPhRQzzLWifb2WOA@mail.gmail.com

I've used Postgres for a long time, but I've only dabbled a bit in the
source code. So I'm curious where this change would go? The btree_gist
extension? The uuid-ossp extension? Somewhere else?

If anyone has any advice I'd be grateful to hear it.

Thanks,
Paul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Paul Jungwirth (#1)
Re: GiST support for UUIDs

Paul A Jungwirth <pj@illuminatedcomputing.com> writes:

I'm interested in adding GiST support for the UUID column type from
the uuid-ossp extension. This has been requested and attempted before:
I've used Postgres for a long time, but I've only dabbled a bit in the
source code. So I'm curious where this change would go? The btree_gist
extension? The uuid-ossp extension? Somewhere else?

btree_gist, I'd think, assuming you are only thinking of btree-equivalent
semantics and not anything more outre. uuid-ossp is only concerned with
UUID generation algorithms, not with storage or indexing. btree_gist
already has a bunch of infrastructure for this type of GiST opclass,
so it should be pretty straightforward to add it there (at least up to
making it work; the overhead of an extension version bump will probably
exceed the useful payload :-( ).

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Paul Jungwirth
pj@illuminatedcomputing.com
In reply to: Tom Lane (#2)
Re: GiST support for UUIDs

On Thu, Jun 25, 2015 at 8:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Paul A Jungwirth <pj@illuminatedcomputing.com> writes:

I'm interested in adding GiST support for the UUID column type
. . . . So I'm curious where this change would go?

btree_gist, I'd think

Okay, thank you for your answer! I was worried about the effects of
having btree_gist depend on uuid-ossp. People won't have to say
`CREATE EXTENSION "uuid-ossp"` if they only want `btree_gist`, right?

the overhead of an extension version bump will probably
exceed the useful payload :-(

Sorry to put more work on your plate. :-) I'm trying to pick something
easy to get my feet wet.

Yours,
Paul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Paul Jungwirth (#3)
Re: GiST support for UUIDs

Paul A Jungwirth <pj@illuminatedcomputing.com> writes:

On Thu, Jun 25, 2015 at 8:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Paul A Jungwirth <pj@illuminatedcomputing.com> writes:

I'm interested in adding GiST support for the UUID column type
. . . . So I'm curious where this change would go?

btree_gist, I'd think

Okay, thank you for your answer! I was worried about the effects of
having btree_gist depend on uuid-ossp. People won't have to say
`CREATE EXTENSION "uuid-ossp"` if they only want `btree_gist`, right?

No, the uuid type exists in core. It's only some creation functions that
are in that extension.

the overhead of an extension version bump will probably
exceed the useful payload :-(

Sorry to put more work on your plate. :-) I'm trying to pick something
easy to get my feet wet.

That work will be on your plate actually ;-). Read the docs concerning
creation of new extension versions, and perhaps look in our git history
for previous commits that have upgraded contrib extensions.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Paul Jungwirth
pj@illuminatedcomputing.com
In reply to: Tom Lane (#4)
Re: GiST support for UUIDs

Hello,

This patch adds UUID support to the `btree_gist` extension. I've also
submitted it to the September commitfest (hopefully correctly!)

Many people are looking for this feature, e.g.:

http://dba.stackexchange.com/questions/83604/optimizing-postgres-row-overlap-constraints-involving-uuids-and-gist
http://stackoverflow.com/questions/22720130/how-to-use-uuid-with-postgresql-gist-index-type
/messages/by-id/CAH3i69njJ9AX1fzHwt6uoUzCMBqnaDBwhmAPhRQzzLWifb2WOA@mail.gmail.com
/messages/by-id/C59F2565-4753-4C83-BDCD-A0F9430B1638@datafax.com

As seen from those links, indexing UUIDs in a GiST is necessary when
including a UUID as part of an exclusion constraint. Here is my note
to the HACKERS list proposing this addition:

/messages/by-id/CA+renyVepHxTO1c7dFbVjP1GYMUc0-3qDNWPN30-noo5MPyaVQ@mail.gmail.com

As discussed there, my changes are restricted to `contrib/btree_gist`.
I pretty much did a copy/paste of the code for intervals, since like
UUIDs they are 16 bytes. In some cases I could simplify, since
sometimes intervals can be less than 16 bytes, but not UUIDs.

This patch also includes tests and new files to bump the extension
version from 1.1 to 1.2.

One possible wart is that because the pg_uuid_t struct is defined in
uuid.c and hence invisible here, I had to repeat its definition (so
that I could include it in the GiST key struct). Alternately I could
move the definition in the core code into uuid.h, but my goal was to
touch only code in contrib. Let me know if you'd rather I made the
struct definition public and I can make that change.

This is my first patch, so my apologies if anything is missing. I went
the guidelines and I think I have everything covered. :-)

Thanks!
Paul Jungwirth

Show quoted text

On Thu, Jun 25, 2015 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Paul A Jungwirth <pj@illuminatedcomputing.com> writes:

On Thu, Jun 25, 2015 at 8:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Paul A Jungwirth <pj@illuminatedcomputing.com> writes:

I'm interested in adding GiST support for the UUID column type
. . . . So I'm curious where this change would go?

btree_gist, I'd think

Okay, thank you for your answer! I was worried about the effects of
having btree_gist depend on uuid-ossp. People won't have to say
`CREATE EXTENSION "uuid-ossp"` if they only want `btree_gist`, right?

No, the uuid type exists in core. It's only some creation functions that
are in that extension.

the overhead of an extension version bump will probably
exceed the useful payload :-(

Sorry to put more work on your plate. :-) I'm trying to pick something
easy to get my feet wet.

That work will be on your plate actually ;-). Read the docs concerning
creation of new extension versions, and perhaps look in our git history
for previous commits that have upgraded contrib extensions.

regards, tom lane

Attachments:

btree_gist_uuid_1.patchapplication/octet-stream; name=btree_gist_uuid_1.patchDownload+2732-1424
#6Michael Paquier
michael@paquier.xyz
In reply to: Paul Jungwirth (#5)
Re: GiST support for UUIDs

On Wed, Aug 19, 2015 at 1:25 PM, Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:

This is my first patch, so my apologies if anything is missing. I went
the guidelines and I think I have everything covered. :-)

I am moving this patch to next CF, removing Julien Rouhaud and Teodor
Sigaev as reviewers because they have not showed up once on this
thread for reviews.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Adam Brusselback
adambrusselback@gmail.com
In reply to: Michael Paquier (#6)
Re: GiST support for UUIDs

So I apologize in advance if I didn't follow the processes exactly, I was
going to attempt to review this to move it along, but ran into issues
applying the patch cleanly to master. I fixed the issues I was having
applying it, and created a new patch (attached).

Managed to test it out after I got it applied, and it is working as
expected for exclusion constraints, as well as normal indexes.

This test was performed on Windows 10 (64 bit), and Postgres was compiled
using MSYS2.

Attachments:

btree_gist_uuid_7.patchapplication/octet-stream; name=btree_gist_uuid_7.patchDownload+1229-12
#8Haribabu Kommi
kommi.haribabu@gmail.com
In reply to: Adam Brusselback (#7)
Re: GiST support for UUIDs

On Wed, Nov 2, 2016 at 3:49 AM, Adam Brusselback <adambrusselback@gmail.com>
wrote:

So I apologize in advance if I didn't follow the processes exactly, I was
going to attempt to review this to move it along, but ran into issues
applying the patch cleanly to master. I fixed the issues I was having
applying it, and created a new patch (attached).

Managed to test it out after I got it applied, and it is working as
expected for exclusion constraints, as well as normal indexes.

This test was performed on Windows 10 (64 bit), and Postgres was compiled
using MSYS2

Thanks for the review. The proposed patch still applies cleanly. If you
don't have any
comments on the patch, please change the patch status as "ready for
committer" for
committer's attention. This will help us in smoother operation of
commitfest.

Regards,
Hari Babu
Fujitsu Australia

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adam Brusselback (#7)
Re: GiST support for UUIDs

Adam Brusselback <adambrusselback@gmail.com> writes:

[ btree_gist_uuid_7.patch ]

I spent awhile looking at this. I have exactly no faith that it won't
crash on alignment-picky hardware, because this declaration:

union pg_uuid_t
{
unsigned char data[UUID_LEN];
uint64 v64[2];
};

is not the same as the existing pg_uuid_t; it instructs the compiler
that union pg_uuid_t must be double-aligned. The existing type is
only byte-aligned, and is declared that way in pg_type, which means
that values on-disk are quite likely not to meet the alignment
expectation.

I spent a little bit of time trying to get the patch to crash on a PPC
machine, without success, but the compiler I have on that (gcc 4.0.1) is
very old and is not aggressive about things like optimizing memcpy calls
on supposedly-aligned arguments. I think a modern gcc version on such
hardware would probably generate code that fails. (Also, PPC is
big-endian, so some of the at-risk code isn't used; a picky little-endian
machine would have more scope for crashing. Not real sure, but I think
ARM might be like that.)

What I would suggest is that you forget the union hack and just use
memcmp in all the comparison functions. It's not likely to be worth
the trouble to try to get those calls to be safely aligned. The
only place where you need go to any trouble is in uuid_parts_distance,
which could be redesigned to memcpy a not-necessarily-aligned source
into a local uint64[2] array and then byte-swap if needed.

(BTW, considering that we're going to return a float distance that has
only got twenty-something significant bits, do we *really* need to deal
with do-it-yourself int128 arithmetic in uuid_parts_distance? Color
me skeptical.)

Also, I can't say that I think it's good style to be duplicating the
declaration of pg_uuid_t out of uuid.c. (And it absolutely, positively,
is vile style to have two different declarations of the same struct in
different files, quite aside from whether they're ABI-compatible or not.)

I don't entirely see the point of making pg_uuid_t an opaque struct when
the only interesting thing about it, namely UUID_LEN, is exposed anyway.
So my inclination would be to just do

typedef struct pg_uuid_t
{
unsigned char data[UUID_LEN];
} pg_uuid_t;

in uuid.h and forget the idea of it being opaque.

Also, the patch could be made a good deal smaller (and easier to review)
in the wake of commit 40b449ae8: you no longer need to convert
btree_gist--1.2.sql into btree_gist--1.3.sql, just leave it alone and add
btree_gist--1.2--1.3.sql. That way we don't have to worry about whether
the upgrade script matches the change in the base script.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Chris Bandy
bandy.chris@gmail.com
In reply to: Tom Lane (#9)
Re: GiST support for UUIDs

On Mon, Nov 28, 2016 at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What I would suggest is that you forget the union hack and just use
memcmp in all the comparison functions. It's not likely to be worth
the trouble to try to get those calls to be safely aligned. The
only place where you need go to any trouble is in uuid_parts_distance,
which could be redesigned to memcpy a not-necessarily-aligned source
into a local uint64[2] array and then byte-swap if needed.

Done. I only have one architecture to test on (Linux, Intel x86_64) so
I must defer to others in this regard.

I don't entirely see the point of making pg_uuid_t an opaque struct when
the only interesting thing about it, namely UUID_LEN, is exposed anyway.
So my inclination would be to just do

typedef struct pg_uuid_t
{
unsigned char data[UUID_LEN];
} pg_uuid_t;

in uuid.h and forget the idea of it being opaque.

Done.

Also, the patch could be made a good deal smaller (and easier to review)
in the wake of commit 40b449ae8: you no longer need to convert
btree_gist--1.2.sql into btree_gist--1.3.sql, just leave it alone and add
btree_gist--1.2--1.3.sql. That way we don't have to worry about whether
the upgrade script matches the change in the base script.

Done.

-- Chris

Attachments:

btree_gist_uuid_8.patchapplication/octet-stream; name=btree_gist_uuid_8.patchDownload+1128-20
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Chris Bandy (#10)
Re: GiST support for UUIDs

Chris Bandy <bandy.chris@gmail.com> writes:

[ btree_gist_uuid_8.patch ]

Um ... is there a reason why the penalty logic in gbt_uuid_penalty()
is completely unlike that for any other btree_gist module?

As best I can tell from the (admittedly documentation-free) code
elsewhere, the penalty ought to be proportional to the fraction
by which the original range is expanded; that's not what this
code is doing. It also seems to be missing the machinations related
to scaling per-column results in a multi-column index.

I'm kind of inclined to change uuid_parts_distance to just convert
a given pg_uuid_t to "double" and then apply penalty_num(), as is
done in gbt_macad_penalty.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#11)
Re: GiST support for UUIDs

I wrote:

I'm kind of inclined to change uuid_parts_distance to just convert
a given pg_uuid_t to "double" and then apply penalty_num(), as is
done in gbt_macad_penalty.

Pushed with that change and some other mostly-cosmetic tweaking.

One not too cosmetic fix was that gbt_uuid_union was declared with the
wrong return type. That's probably mostly harmless given that core GiST
pays little attention to the declared signatures of the support functions,
but it's not a good thing. This would've been caught if anyone had
thought to run the amvalidate functions on the updated extension.
I think I will go and put a call to that into the regression tests of
all the contrib modules that define new opclasses.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Chris Bandy
bandy.chris@gmail.com
In reply to: Tom Lane (#12)
Re: GiST support for UUIDs

On Tue, Nov 29, 2016 at 1:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Pushed with that change and some other mostly-cosmetic tweaking.

Thank you for addressing all those issues, Tom! I tested some
exclusion constraints that are interesting to me, and everything seems
to be working well.

-- Chris

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers