GiST support for inet datatypes

Started by Emre Hasegeliover 12 years ago49 messageshackers

emre@hasegeli.com

over 12 years ago

Hi,

Attached patch adds GiST support to the inet datatypes with two new
operators. Overlaps operator can be used with exclusion constraints.
Is adjacent to operator is just the negator of it. Index uses only
the network bits of the addresses. Except for the new operators and
is contained within, contains; basic comparison operators are also
supported.

Query planner never chooses to use the index for the operators which
the index is particularly useful because selectivity estimation functions
are missing. I am planning to work on them.

I also wanted to add strictly left of and strictly right of operators
but I did not want to introduce new symbols. I think we need style
guidelines for them. Range types use <@ and @> for is contained within
and contains operators; << and >> for strictly left of and strictly right of
operators. It would be nice if we could change the symbols for contains
and is contained within operators of the inet datatypes. Then we could
use the old ones for strictly left of and strictly right of operators.

I did not touch opr_sanity regression tests as I did not decide
how to solve these problems. I did not add documentation except
the new operators. It would be nice to mention the index and exclusion
constraints for inet datatypes somewhere. I did not know which page
would be more suitable.

David Fetter

david@fetter.org

over 12 years ago

In reply to: Emre Hasegeli (#1)

Re: GiST support for inet datatypes

On Tue, Dec 17, 2013 at 08:58:13PM +0200, Emre Hasegeli wrote:

Hi,

Attached patch adds GiST support to the inet datatypes with two new
operators. Overlaps operator can be used with exclusion constraints.
Is adjacent to operator is just the negator of it. Index uses only
the network bits of the addresses. Except for the new operators and
is contained within, contains; basic comparison operators are also
supported.

Please add this patch to the upcoming Commitfest at
https://commitfest.postgresql.org/action/commitfest_view?id=21

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Emre Hasegeli

emre@hasegeli.com

over 12 years ago

In reply to: Emre Hasegeli (#1)

Re: GiST support for inet datatypes

2013-12-17 Emre Hasegeli <emre@hasegeli.com>:

Query planner never chooses to use the index for the operators which
the index is particularly useful because selectivity estimation functions
are missing. I am planning to work on them.

Attached patch adds selectivity estimation functions for the overlap and
adjacent operators. Other operators need a bit more work. I want to send it
before to get some feedback.

Andreas Karlsson

andreas.karlsson@percona.com

over 12 years ago

In reply to: Emre Hasegeli (#1)

Re: GiST support for inet datatypes

Hi,

I will review your two patches (gist support + selectivity). This is
part 1 of my review. I will look more into the actual GiST
implementation in a couple of days, but thought I could provide you with
my initial input right away.

inet-gist
---------

General:

I like the idea of the patch and think the && operator is useful for
exclusion constraints, and that indexing the contains operator is useful
for IP-address lookups. There is an extension, ip4r, which adds a GiST
indexed type for IP ranges but since we have the inet type in core I
think it should have GiST indexes.

I am not convinced an adjacent operator is useful for the inet type, but
if it is included it should be indexed just like -|- of ranges. We
should try to keep these lists of indexed operators the same.

Compilation:

Compiled without errors.

Regression tests:

One of the broken regression tests seems unrelated to the selectivity
functions.

-- Make a list of all the distinct operator names being used in particular
-- strategy slots.

I think it would be fine just to add the newly indexed operators here,
but the more indexed operators we get in the core the less useful this
test becomes.

Source code:

The is adjacent to operator, -|-, does not seem to be doing the right
thing. In the code it is implemented as the inverse of && which is not
true. The below comparison should return false.

SELECT '10.0.0.1'::inet -|- '127.0.0.1'::inet;
?column?
----------
t
(1 row)

I am a bit suspicious about your memcmp based optimization in
bitncommon, but it could be good. Have you benchmarked it compared to
doing the same thing with a loop?

Documentation:

Please use examples in the operator table which will evaluate to true,
and for the && case an example where not both sides are the same.

I have not found a place either in the documentation where it is
documented which operators are documented. I would suggest just adding a
short note after the operators table.

inet-selfuncs
-------------

Compilation:

There were some new warnings caused by this patch (with gcc 4.8.2). The
warnings were all about the use of uninitialized variables (right,
right_min_bits, right_order) in inet_hist_overlap_selectivity. Looking
at the code I see that they are harmless but you should still rewrite
the loop to silence the warnings.

Regression tests:

There are two broken

-- Insist that all built-in pg_proc entries have descriptions

Here you should just add descriptions for the new functions in pg_proc.h.

-- Check that all opclass search operators have selectivity estimators.

Fails due to missing selectivity functions for the operators. I think
you should at least for now do like the range types and use areajoinsel,
contjoinsel, etc for the join selectivity estimation.

Source code:

The selectivity estimation functions, network_overlap_selectivity and
network_adjacent_selectivity, should follow the naming conventions of
the other selectivity estimation functions. They are named things like
tsmatchsel, tsmatchjoinsel, and rangesel.

--
Andreas Karlsson

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Emre Hasegeli

emre@hasegeli.com

over 12 years ago

In reply to: Andreas Karlsson (#4)

Re: GiST support for inet datatypes

2014-01-19 Andreas Karlsson <andreas@proxel.se>:

Hi,

I will review your two patches (gist support + selectivity). This is part 1
of my review. I will look more into the actual GiST implementation in a
couple of days, but thought I could provide you with my initial input right
away.

Thank you for looking at it.

inet-gist
---------

General:

I like the idea of the patch and think the && operator is useful for
exclusion constraints, and that indexing the contains operator is useful for
IP-address lookups. There is an extension, ip4r, which adds a GiST indexed
type for IP ranges but since we have the inet type in core I think it should
have GiST indexes.

I am not convinced an adjacent operator is useful for the inet type, but if
it is included it should be indexed just like -|- of ranges. We should try
to keep these lists of indexed operators the same.

I added it just not to leave negotor field empty. It can also be useful with
exclusion constraints but not with GiST support. I did not add GiST support
for it and the not equals operator because they do not fit the index
structure. I can just remove the operator for now.

Compilation:

Compiled without errors.

Regression tests:

One of the broken regression tests seems unrelated to the selectivity
functions.

-- Make a list of all the distinct operator names being used in particular
-- strategy slots.

I think it would be fine just to add the newly indexed operators here, but
the more indexed operators we get in the core the less useful this test
becomes.

I did not add the new operators to the list because I do not feel right
about supporting <<, <<=, >>, >>= symbols for these operators.
They should be <@, <@=, @>, @>= to be consistent with other types.

I am a bit suspicious about your memcmp based optimization in bitncommon,
but it could be good. Have you benchmarked it compared to doing the same
thing with a loop?

I did, when I was writing that part. I will be happy to do it again. I will
post the results.

Documentation:

Please use examples in the operator table which will evaluate to true, and
for the && case an example where not both sides are the same.

I will change in the next version.

I have not found a place either in the documentation where it is documented
which operators are documented. I would suggest just adding a short note
after the operators table.

I will add in the next version.

inet-selfuncs
-------------

Compilation:

There were some new warnings caused by this patch (with gcc 4.8.2). The
warnings were all about the use of uninitialized variables (right,
right_min_bits, right_order) in inet_hist_overlap_selectivity. Looking at
the code I see that they are harmless but you should still rewrite the loop
to silence the warnings.

I will fix in the next version.

Regression tests:

There are two broken

-- Insist that all built-in pg_proc entries have descriptions

Here you should just add descriptions for the new functions in pg_proc.h.

I will add in the next version.

-- Check that all opclass search operators have selectivity estimators.

Fails due to missing selectivity functions for the operators. I think you
should at least for now do like the range types and use areajoinsel,
contjoinsel, etc for the join selectivity estimation.

I did not support them in the first version because I could not decide
the way. I have better options than using the the geo_selfuncs for these
operators. The options are from simple to complex:

0) just use network_overlap_selectivity
1) fix network_overlap_selectivity with a constant between 0 and 1
2) calculate this constant by using masklens of all buckets of the histogram
3) calculate this constant by using masklens of matching buckets of
the histogram
4) store STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for this
types, calculate this constant with it
5) create another kind of histogram for masklens, calculate this constant
with it

I do not know how to do 4 or 5, so I will try 3 for the next version. Do you
think it is a good idea?

Source code:

The selectivity estimation functions, network_overlap_selectivity and
network_adjacent_selectivity, should follow the naming conventions of the
other selectivity estimation functions. They are named things like
tsmatchsel, tsmatchjoinsel, and rangesel.

I will rename it to inetoverlapsel, then.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andreas Karlsson

andreas.karlsson@percona.com

over 12 years ago

In reply to: Emre Hasegeli (#5)

Re: GiST support for inet datatypes

On 01/19/2014 11:10 AM, Emre Hasegeli wrote:

I am not convinced an adjacent operator is useful for the inet type, but if
it is included it should be indexed just like -|- of ranges. We should try
to keep these lists of indexed operators the same.

I added it just not to leave negotor field empty. It can also be useful with
exclusion constraints but not with GiST support. I did not add GiST support
for it and the not equals operator because they do not fit the index
structure. I can just remove the operator for now.

Sounds good. None of the other && implementations have a negator.

I think it would be fine just to add the newly indexed operators here, but
the more indexed operators we get in the core the less useful this test
becomes.

I did not add the new operators to the list because I do not feel right
about supporting <<, <<=, >>, >>= symbols for these operators.
They should be <@, <@=, @>, @>= to be consistent with other types.

I agree, but I am not sure if it is ok to break backward compatibility here.

-- Check that all opclass search operators have selectivity estimators.

Fails due to missing selectivity functions for the operators. I think you
should at least for now do like the range types and use areajoinsel,
contjoinsel, etc for the join selectivity estimation.

I did not support them in the first version because I could not decide
the way. I have better options than using the the geo_selfuncs for these
operators. The options are from simple to complex:

0) just use network_overlap_selectivity
1) fix network_overlap_selectivity with a constant between 0 and 1
2) calculate this constant by using masklens of all buckets of the histogram
3) calculate this constant by using masklens of matching buckets of
the histogram
4) store STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for this
types, calculate this constant with it
5) create another kind of histogram for masklens, calculate this constant
with it

I do not know how to do 4 or 5, so I will try 3 for the next version. Do you
think it is a good idea?

Solutions 0 and 1 does not really sound better than using geo_selfuncs.

--
Andreas Karlsson

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Andreas Karlsson

andreas.karlsson@percona.com

over 12 years ago

In reply to: Andreas Karlsson (#4)

Re: GiST support for inet datatypes

Here comes part 2 of 2 of the review.

inet-gist
---------

In general the code looks good and I think your approach makes sense.
Not an expert on GiST though so I would like a second opinion on this.
Maybe there is some clever trick which would make the index more
efficient, but the design I see is simple and logical. Like this much
more than the incorrect GiST index for inet in btree_gist.

The only thing is that I think your index would do poorly on the case
where all values share a prefix before the netmask but have different
values after the netmask, since gist_union and gist_penalty does not
care about the bits after the netmask. Am I correct? Have you thought
anything about this?

To create such data:

CREATE TABLE a (ip inet);
INSERT INTO a SELECT '127.0.0.0/16'::inet + s FROM generate_series(1,
pow(2, 16)::int) s;
CREATE INDEX aidx ON a USING gist (ip);

Saw also some minor things too.

Typo in comment, weird sentence:

* The main question to calculate the union is that they have how many
* bits in common. [...]

I do not like how you called the result key i inet_union_gist() "tmp".
Something like "result" or "result_ip" would be better.

Typo in comment, "reverse" should be "inverse":

* Penalty is reverse of the common IP bits of the two addresses.

Typo in comment, missing "be":

* Values bigger than 1 are used when the common IP bits cannot
* calculated.

Language can be improved in this comment. Both ways to split are by the
union of the keys, it is more of a split by address prefix.

* The second and the important way is to split by the union of the keys.
* Union of the keys calculated same way with the inet_gist_union function.
* The first and the last biggest subnets created from the calculated
* union. In this case addresses contained by the first subnet will be put
* to the left bucket, address contained by the last subnet will be put to
* the right bucket.

Could you not just use memcmp in inet_gist_same() instead of bitncmp()
since it only cares about equality?

There is an extra empty line at the end of network_gist.c

inet-selfuncs
-------------

Here I have to honestly admit I am not sure if I can tell if your
selectivity function is correct. Your explanation sounds convincing and
the general idea of looking at the histogram is correct. I am just have
no idea if the details are right.

DEFAULT_NETWORK_OVERLAP_SELECTIVITY should be renamed
DEFAULT_NETWORK_OVERLAP_SEL and moved to the .c file.

Typo in comment, "constrant" -> "constant".

There is an extra empty line at the end of network_selfuncs.c

--
Andreas Karlsson

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Emre Hasegeli

emre@hasegeli.com

over 12 years ago

In reply to: Andreas Karlsson (#7)

Re: GiST support for inet datatypes

2014/1/20 Andreas Karlsson <andreas@proxel.se>:

Here comes part 2 of 2 of the review.

Second versions of the patches attached. They address both of your reviews.

inet-gist
---------

In general the code looks good and I think your approach makes sense. Not an
expert on GiST though so I would like a second opinion on this. Maybe there
is some clever trick which would make the index more efficient, but the
design I see is simple and logical. Like this much more than the incorrect
GiST index for inet in btree_gist.

I realized that create extension btree_gist fails with the patch.

ERROR: could not make operator class "gist_inet_ops" be default for type inet
DETAIL: Operator class "inet_ops" already is the default.

I think making the new operator class default makes more sense. Name conflict
can be a bigger problem. Should I rename the new operator class?

The only thing is that I think your index would do poorly on the case where
all values share a prefix before the netmask but have different values after
the netmask, since gist_union and gist_penalty does not care about the bits
after the netmask. Am I correct? Have you thought anything about this?

Yes, I have tried to construct the index with ip_maxbits() instead of ip_bits()
but I could not make it work with the cidr type. It can be done by adding
operator as a parameter for union, penalty and picksplit functions on the
GiST framework. I am not sure it worths the trouble. It would only help
the basic comparison operators and it would slow down other operators because
network length comparison before network address would not be possible for
the intermediate nodes. I mentioned this behavior of the operator class on
the paragraph added to the documentation.

Saw also some minor things too.

Typo in comment, weird sentence:

* The main question to calculate the union is that they have how many
* bits in common. [...]

I rewrite it. I am sorry about my English.

I do not like how you called the result key i inet_union_gist() "tmp".
Something like "result" or "result_ip" would be better.

Typo in comment, "reverse" should be "inverse":

* Penalty is reverse of the common IP bits of the two addresses.

Typo in comment, missing "be":

* Values bigger than 1 are used when the common IP bits cannot
* calculated.

Language can be improved in this comment. Both ways to split are by the
union of the keys, it is more of a split by address prefix.

* The second and the important way is to split by the union of the keys.
* Union of the keys calculated same way with the inet_gist_union function.
* The first and the last biggest subnets created from the calculated
* union. In this case addresses contained by the first subnet will be put
* to the left bucket, address contained by the last subnet will be put to
* the right bucket.

Could you not just use memcmp in inet_gist_same() instead of bitncmp() since
it only cares about equality?

There is an extra empty line at the end of network_gist.c

I changed them.

inet-selfuncs
-------------

Here I have to honestly admit I am not sure if I can tell if your
selectivity function is correct. Your explanation sounds convincing and the
general idea of looking at the histogram is correct. I am just have no idea
if the details are right.

I tried to improve it in this version. I hope it is more readable now. I used
the word inclusion instead of overlap, made some changes on the algorithm
to make it more suitable to the other operators.

Using the histogram for inclusion which is originally for basic comparison,
is definitely not correct. It is an empirical approach. I have tried several
versions of it, tested them with different data sets and thought it is better
than nothing.

DEFAULT_NETWORK_OVERLAP_SELECTIVITY should be renamed
DEFAULT_NETWORK_OVERLAP_SEL and moved to the .c file.

Typo in comment, "constrant" -> "constant".

There is an extra empty line at the end of network_selfuncs.c

I changed them.

Andreas Karlsson

andreas.karlsson@percona.com

over 12 years ago

In reply to: Emre Hasegeli (#8)

Re: GiST support for inet datatypes

On 01/23/2014 11:22 PM, Emre Hasegeli wrote:

inet-gist
---------
I realized that create extension btree_gist fails with the patch.

ERROR: could not make operator class "gist_inet_ops" be default for type inet
DETAIL: Operator class "inet_ops" already is the default.

I think making the new operator class default makes more sense. Name conflict
can be a bigger problem. Should I rename the new operator class?

I agree that the new operator class should be the default one, it is
more useful and also not buggy. No idea about the naming.

The only thing is that I think your index would do poorly on the case where
all values share a prefix before the netmask but have different values after
the netmask, since gist_union and gist_penalty does not care about the bits
after the netmask. Am I correct? Have you thought anything about this?

Yes, I have tried to construct the index with ip_maxbits() instead of ip_bits()
but I could not make it work with the cidr type. It can be done by adding
operator as a parameter for union, penalty and picksplit functions on the
GiST framework. I am not sure it worths the trouble. It would only help
the basic comparison operators and it would slow down other operators because
network length comparison before network address would not be possible for
the intermediate nodes. I mentioned this behavior of the operator class on
the paragraph added to the documentation.

I think this is fine since it does not seem like a very useful case to
support to me. Would be worth doing only if it is easy to do.

A separate concern: we still have not come to a decision about operators
for inet here. I do not like the fact that the operators differ between
ranges and inet. But I feel this might be out of scope for this patch.

Does any third party want to chime in with an opinion?

Current inet/cidr
<< is contained within
<<= is contained within or equals

contains
= contains or equals

Range types
@> contains range
<@ range is contained by
&& overlap (have points in common)
<< strictly left of

strictly right of

inet-selfuncs
-------------

Here I have to honestly admit I am not sure if I can tell if your
selectivity function is correct. Your explanation sounds convincing and the
general idea of looking at the histogram is correct. I am just have no idea
if the details are right.

I tried to improve it in this version. I hope it is more readable now. I used
the word inclusion instead of overlap, made some changes on the algorithm
to make it more suitable to the other operators.

Using the histogram for inclusion which is originally for basic comparison,
is definitely not correct. It is an empirical approach. I have tried several
versions of it, tested them with different data sets and thought it is better
than nothing.

Thanks for the updates. The code in this version is cleaner and easier
to follow.

I am not convinced of your approach to calculating the selectivity from
the histogram. The thing I have the problem with is the clever trickery
involved with how you handle different operator types. I prefer the
clearer code of the range types with how calc_hist_selectivity_scalar is
used. Is there any reason for why that approach would not work here or
result in worse code?

I have attached a patch where I improved the language of your comment
describing the workings of the selectivity estimation. Could you please
check it so I did not change the meaning of any of it?

I see from the tests that you still are missing selectivity functions
for operators, what is your plan for this?

--
Andreas Karlsson

#10

Emre Hasegeli

emre@hasegeli.com

over 12 years ago

In reply to: Emre Hasegeli (#5)

Re: GiST support for inet datatypes

2014-01-19 12:10, Emre Hasegeli <emre@hasegeli.com>:

2014-01-19 Andreas Karlsson <andreas@proxel.se>:

I am a bit suspicious about your memcmp based optimization in bitncommon,
but it could be good. Have you benchmarked it compared to doing the same
thing with a loop?

I did, when I was writing that part. I will be happy to do it again. I will
post the results.

I was testing it by creating GiST indexes. I realized that these test are
inconsistent when BUFFERING = AUTO. I repeated them with BUFFERING = ON.
The function without memcmp was faster in this case. I will change
the function in the next version of the patch.

The test case:

Create table Network as
select (a || '.' || b || '.' || c || '/24')::cidr
from generate_series(0, 255) as a,
generate_series(0, 255) as b,
generate_series(0, 255) as c;

Drop index if exists N;
Create index N on Network using gist(cidr) with (buffering = on);

Create table Network6 as
select ('::' || to_hex(a) || ':' || to_hex(b))::inet
from generate_series(0, 255) as a,
generate_series(0, 65535) as b;

Drop index if exists N6;
Create index N6 on Network6 using gist(inet) with (buffering = on);

What I could not understand is the tests with IP version 6 was much faster.
The row count is same, the index size is bigger.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Emre Hasegeli

emre@hasegeli.com

over 12 years ago

In reply to: Andreas Karlsson (#9)

Re: GiST support for inet datatypes

Third versions of the patches attached. They are rebased to the HEAD. In
this versions, the bitncommon function is changed. <sys/socket.h> included
to network_gist.c to be able to compile it on FreeBSD. Geometric mean
calculation for partial bucket match on the function
inet_hist_inclusion_selectivity
reverted back. It was something I changed without enough testing on
the second revision of the patch. This version uses the maximum divider
calculated from the boundaries of the bucket, like the first version. It is
simpler and more reliable.

2014-01-31 3:58, Andreas Karlsson <andreas@proxel.se>:

I agree that the new operator class should be the default one, it is more
useful and also not buggy. No idea about the naming.

I have misread the name, rename was not necessary. I removed the DEFAULT
keywords for the inet and the cidr operator classes from btree_gist--1.0.sql
on the inet-gist patch.

A separate concern: we still have not come to a decision about operators for
inet here. I do not like the fact that the operators differ between ranges
and inet. But I feel this might be out of scope for this patch.

I attached a separate optional patch to rename the inclusion operators. It
leaves the old names, changes the GiST supported operators and the operator
names in the documentation. It would be better to apply it with the GiST
support, in my opinion.

I am not convinced of your approach to calculating the selectivity from the
histogram. The thing I have the problem with is the clever trickery involved
with how you handle different operator types. I prefer the clearer code of
the range types with how calc_hist_selectivity_scalar is used. Is there any
reason for why that approach would not work here or result in worse code?

Currently we do not have histogram of the lower and upper bounds as
the range types. Current histogram can be used nicely as the lower bound,
but not the upper bound because the comparison is first on the common bits
of the network part, then on the length of the network part. For example,
10.0/16 is defined as greater than 10/8.

Using the histogram as the lower bounds of the networks is not enough to
calculate selectivity for any of these operators. Using it also as the upper
bounds is still not enough for the inclusion operators. The lengths of
the network parts should taken into consideration in a way and it is
what this patch does. Using separate histograms for the lower bounds,
the upper bounds and the lengths of the network parts can solve all of these
problems, but it is a lot of work.

I have attached a patch where I improved the language of your comment
describing the workings of the selectivity estimation. Could you please
check it so I did not change the meaning of any of it?

Thank you. I only added the "for" to the sentence "This ratio can be big
enough to not disregard for addresses with small masklens." I merged
the changes to this version of the patch.

I see from the tests that you still are missing selectivity functions for
operators, what is your plan for this?

This was because the join selectivity estimation functions. I set
the geo_selfuncs for the missing ones. All tests pass with them. I want
to develop the join selectivity function too, but not for this commit fest.

#12

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Emre Hasegeli (#11)

Re: GiST support for inet datatypes

On Thu, Feb 6, 2014 at 12:14 PM, Emre Hasegeli <emre@hasegeli.com> wrote:

I have misread the name, rename was not necessary. I removed the DEFAULT
keywords for the inet and the cidr operator classes from btree_gist--1.0.sql
on the inet-gist patch.

Generally, modifying already-release .sql files for extensions is a no-no...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Emre Hasegeli

emre@hasegeli.com

over 12 years ago

In reply to: Robert Haas (#12)

Re: GiST support for inet datatypes

2014-02-07 22:41, Robert Haas <robertmhaas@gmail.com>:

Generally, modifying already-release .sql files for extensions is a no-no...

I prepared separate patches for btree_gist extension with more options.
First one (btree-gist-drop-default-inet-v1.patch) removes DEFAULT keyword
only from the inet and the cidr operator classes. Second one
(btree-gist-drop-default-all-v1.patch) removes DEFAULT keyword for all
operator classes. I think it is more consistent to remove it from all.
Third one (btree-gist-drop-inet-v1.patch) removes the inet and the cidr
operator classes altogether. It is suggested by Tom Lane [1]/messages/by-id/10183.1287526949@sss.pgh.pa.us on bug #5705.
The new GiST operator class includes basic comparison operators except !=
so it may be the right time to remove support from btree_gist. Fourth one
(btree-gist-drop-inet-and-default-v1.patch) is the second one and the third
one together.

[1]: /messages/by-id/10183.1287526949@sss.pgh.pa.us

#14

Andres Freund

andres@anarazel.de

over 12 years ago

In reply to: Emre Hasegeli (#13)

Re: GiST support for inet datatypes

On 2014-02-17 14:40:07 +0200, Emre Hasegeli wrote:

2014-02-07 22:41, Robert Haas <robertmhaas@gmail.com>:

Generally, modifying already-release .sql files for extensions is a no-no...

I prepared separate patches for btree_gist extension with more options.
First one (btree-gist-drop-default-inet-v1.patch) removes DEFAULT keyword
only from the inet and the cidr operator classes. Second one
(btree-gist-drop-default-all-v1.patch) removes DEFAULT keyword for all
operator classes. I think it is more consistent to remove it from all.
Third one (btree-gist-drop-inet-v1.patch) removes the inet and the cidr
operator classes altogether. It is suggested by Tom Lane [1] on bug #5705.
The new GiST operator class includes basic comparison operators except !=
so it may be the right time to remove support from btree_gist. Fourth one
(btree-gist-drop-inet-and-default-v1.patch) is the second one and the third
one together.

[1] /messages/by-id/10183.1287526949@sss.pgh.pa.us

diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile
index ba4af14..d5b1fd7 100644
--- a/contrib/btree_gist/Makefile
+++ b/contrib/btree_gist/Makefile
@@ -9,7 +9,7 @@ OBJS =  btree_gist.o btree_utils_num.o btree_utils_var.o btree_int2.o \
btree_numeric.o

EXTENSION = btree_gist
-DATA = btree_gist--1.0.sql btree_gist--unpackaged--1.0.sql
+DATA = btree_gist--1.1.sql btree_gist--unpackaged--1.0.sql

You need to add a file for going from 1.0 to 1.1.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Emre Hasegeli

emre@hasegeli.com

over 12 years ago

In reply to: Andres Freund (#14)

Re: GiST support for inet datatypes

2014-02-17 14:54, Andres Freund <andres@2ndquadrant.com>:

You need to add a file for going from 1.0 to 1.1.

Thank you for the notice. I added them to the patches which touch only two
of the operator classes. It drops and re-creates operator classes as there
is not ALTER OPERATOR CLASS DROP DEFAULT command. I do not apply it to
the patch to remove the DEFAULT keyword from all of the operator classes
because it would nearly be same as dropping and re-creating the extension.

#16

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Emre Hasegeli (#15)

Re: GiST support for inet datatypes

Emre Hasegeli <emre@hasegeli.com> writes:

2014-02-17 14:54, Andres Freund <andres@2ndquadrant.com>:

You need to add a file for going from 1.0 to 1.1.

Thank you for the notice. I added them to the patches which touch only two
of the operator classes. It drops and re-creates operator classes as there
is not ALTER OPERATOR CLASS DROP DEFAULT command.

Dropping an operator class is quite unacceptable, as it will cause indexes
based on that class to go away (or more likely, cause the upgrade to fail,
if you didn't use CASCADE). What we've done in the past for changes that
are nominally unsupported is to make the upgrade scripts tweak the system
catalogs directly.

More generally, it doesn't look to me like these upgrade scripts are
complete; shouldn't they be creating some new objects, not just replacing
old ones?

We need to have a discussion as to whether it's actually sane for an
upgrade to remove the DEFAULT marking on a pre-existing opclass. It
strikes me that this would for instance break pg_dump dumps, in the sense
that the reloaded index would probably now have a different opclass
than before (since pg_dump would see no need to have put an explicit
opclass name into CREATE INDEX if it was the default in the old database).
Even if the new improved opclass is in all ways better, that would be
catastrophic for pg_upgrade I suspect. Unless the new opclass is
on-disk-compatible with the old; in which case we shouldn't be creating
a new opclass at all, but just modifying the definition of the old one.

In short we probably need to think a bit harder about what this patch is
proposing to do. It seems fairly likely to me that some other approach
would be a better idea.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Emre Hasegeli

emre@hasegeli.com

over 12 years ago

In reply to: Tom Lane (#16)

Re: GiST support for inet datatypes

2014-02-17 22:16, Tom Lane <tgl@sss.pgh.pa.us>:

More generally, it doesn't look to me like these upgrade scripts are
complete; shouldn't they be creating some new objects, not just replacing
old ones?

The actual patches are on the previous mail [1]/messages/by-id/CAE2gYzxc0FXEwe59sFdUZNQ24c+fRBDMGXwwvbYvmeaNateidw@mail.gmail.com. I was just trying
to solve the problem that btree_gist cannot be loaded because of
the new operator class.

In short we probably need to think a bit harder about what this patch is
proposing to do. It seems fairly likely to me that some other approach
would be a better idea.

How about only removing the inet and the cidr operator classes
from btree_gist. btree-gist-drop-inet-v2.patch does that.

[1]: /messages/by-id/CAE2gYzxc0FXEwe59sFdUZNQ24c+fRBDMGXwwvbYvmeaNateidw@mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Emre Hasegeli (#17)

Re: GiST support for inet datatypes

Emre Hasegeli <emre@hasegeli.com> writes:

How about only removing the inet and the cidr operator classes
from btree_gist. btree-gist-drop-inet-v2.patch does that.

I'm not sure which part of "no" you didn't understand, but to be
clear: you don't get to break existing installations.

Assuming that this opclass is sufficiently better than the existing one,
it would sure be nice if it could become the default; but I've not seen
any proposal in this thread that would allow that without serious upgrade
problems. I think the realistic alternatives so far are (1) new opclass
is not the default, or (2) this patch gets rejected.

We should probably expend some thought on a general approach to
replacing the default opclass for a datatype, because I'm sure this
will come up again. Right now I don't see a feasible way.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Robert Haas

robertmhaas@gmail.com

over 12 years ago

In reply to: Tom Lane (#18)

Re: GiST support for inet datatypes

On Mon, Feb 17, 2014 at 3:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Emre Hasegeli <emre@hasegeli.com> writes:

How about only removing the inet and the cidr operator classes
from btree_gist. btree-gist-drop-inet-v2.patch does that.

I'm not sure which part of "no" you didn't understand, but to be
clear: you don't get to break existing installations.

Assuming that this opclass is sufficiently better than the existing one,
it would sure be nice if it could become the default; but I've not seen
any proposal in this thread that would allow that without serious upgrade
problems. I think the realistic alternatives so far are (1) new opclass
is not the default, or (2) this patch gets rejected.

We should probably expend some thought on a general approach to
replacing the default opclass for a datatype, because I'm sure this
will come up again. Right now I don't see a feasible way.

It seems odd for "default" to be a property of the opclass rather than
a separate knob. What comes to mind is something like:

ALTER TYPE sniffle SET DEFAULT OPERATOR CLASS FOR btree TO achoo_ops;

Not sure how much that helps.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Robert Haas (#19)

Re: GiST support for inet datatypes

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Feb 17, 2014 at 3:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

We should probably expend some thought on a general approach to
replacing the default opclass for a datatype, because I'm sure this
will come up again. Right now I don't see a feasible way.

It seems odd for "default" to be a property of the opclass rather than
a separate knob. What comes to mind is something like:
ALTER TYPE sniffle SET DEFAULT OPERATOR CLASS FOR btree TO achoo_ops;
Not sure how much that helps.

Not at all, AFAICS. If it were okay to decide that some formerly-default
opclass is no longer default, then having such a command would be better
than manually manipulating pg_opclass.opcdefault --- but extension upgrade
scripts could certainly do the latter if they had to. The problem here
is whether it's upgrade-safe to make such a change at all; having
syntactic sugar doesn't make it safer.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Emre Hasegeli

emre@hasegeli.com

over 12 years ago

In reply to: Tom Lane (#20)

#22

Tom Lane

tgl@sss.pgh.pa.us

over 12 years ago

In reply to: Emre Hasegeli (#21)

#23

Emre Hasegeli