@ versus ~, redux

Started by Tom Laneover 19 years ago34 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Awhile back I complained that while all the core geometric datatypes
use operator @ to mean "contained in" and operator ~ to mean "contains",
contrib/cube and contrib/seg switch the meanings:
http://archives.postgresql.org/pgsql-hackers/2005-06/msg01238.php

There was some followup discussion generally agreeing that we ought to
get these things in sync, but then Andrew@supernews threw a monkey
wrench into the proceedings by suggesting we change to different names
entirely:
http://archives.postgresql.org/pgsql-hackers/2005-06/msg01263.php
That is not necessarily a bad idea, but I didn't want to get drawn
into a debate about exactly what alternative names to adopt, so I
dropped the problem for the time being.

I now find that the GIN patch has propagated the contrib meanings
of these operators into the core:
http://archives.postgresql.org/pgsql-general/2006-09/msg00087.php
and at this point I'm going to put my foot down and insist that
we do *something*. I won't hold still for fundamentally backward
meanings of the same operator name within the core datatypes.

I can see various things that we might consider doing:

1. Just flip the names of the two operators added by the GIN patch.

2. #1 plus flip the names of the various contrib operators that are
out of sync (Michael Fuhr points out that contrib/intarray is out
of step too ... are there others?).

3. Leave the existing op names as-is in core and contrib, but consider
them deprecated and add new ops with consistently-chosen names.
(The new ops introduced by GIN should only exist with the new names.)

#1 isn't doing anything towards solving the underlying problem.
#2 has got obvious backwards-compatibility issues for contrib users.
#3 may or may not be technically feasible (I'm not sure if we can
support multiple operators occupying the same slot in an opclass),
besides which choosing the names to use could degenerate to a flamewar.

Thoughts, votes, better ideas? The only option I'm *not* open to is
leaving HEAD as it stands.

regards, tom lane

#2Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#1)
Re: @ versus ~, redux

I can see various things that we might consider doing:

1. Just flip the names of the two operators added by the GIN patch.

2. #1 plus flip the names of the various contrib operators that are
out of sync (Michael Fuhr points out that contrib/intarray is out
of step too ... are there others?).

3. Leave the existing op names as-is in core and contrib, but consider
them deprecated and add new ops with consistently-chosen names.
(The new ops introduced by GIN should only exist with the new names.)

#1 isn't doing anything towards solving the underlying problem.
#2 has got obvious backwards-compatibility issues for contrib users.

+1 on #2 with the following caveat. When we publish the release notes,
we have a specific section that says:

Compatibility changes from previous releases. Which IMHO should be there
anyway as there are always compatibility issues from release to release.

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#3Oleg Bartunov
oleg@sai.msu.su
In reply to: Joshua D. Drake (#2)
Re: @ versus ~, redux

On Sun, 3 Sep 2006, Joshua D. Drake wrote:

I can see various things that we might consider doing:

1. Just flip the names of the two operators added by the GIN patch.

2. #1 plus flip the names of the various contrib operators that are
out of sync (Michael Fuhr points out that contrib/intarray is out
of step too ... are there others?).

3. Leave the existing op names as-is in core and contrib, but consider
them deprecated and add new ops with consistently-chosen names.
(The new ops introduced by GIN should only exist with the new names.)

#3 looks good to me. Too many users. We should give them time for
upgrading. Probably, we need special chapter "To be obsoleted in the next
release" in Release notes.

#1 isn't doing anything towards solving the underlying problem.
#2 has got obvious backwards-compatibility issues for contrib users.

+1 on #2 with the following caveat. When we publish the release notes, we
have a specific section that says:

Compatibility changes from previous releases. Which IMHO should be there
anyway as there are always compatibility issues from release to release.

Joshua D. Drake

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#4Joshua D. Drake
jd@commandprompt.com
In reply to: Oleg Bartunov (#3)
Re: @ versus ~, redux

Oleg Bartunov wrote:

On Sun, 3 Sep 2006, Joshua D. Drake wrote:

I can see various things that we might consider doing:

1. Just flip the names of the two operators added by the GIN patch.

2. #1 plus flip the names of the various contrib operators that are
out of sync (Michael Fuhr points out that contrib/intarray is out
of step too ... are there others?).

3. Leave the existing op names as-is in core and contrib, but consider
them deprecated and add new ops with consistently-chosen names.
(The new ops introduced by GIN should only exist with the new names.)

#3 looks good to me. Too many users. We should give them time for
upgrading. Probably, we need special chapter "To be obsoleted in the next
release" in Release notes.

Users will have time to upgrade should they be responsible. Nobody in
their right might is going to upgrade to 8.2 on a production site the
day it is released.

They are going to test it with their code, and their work load. If it
takes them an extra day to implement query changes (or even an extra
month), good. It will serve them better in the long run.

Sincerely,

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Oleg Bartunov (#3)
Re: @ versus ~, redux

Oleg Bartunov <oleg@sai.msu.su> writes:

3. Leave the existing op names as-is in core and contrib, but consider
them deprecated and add new ops with consistently-chosen names.
(The new ops introduced by GIN should only exist with the new names.)

#3 looks good to me. Too many users.

Not only that, but it'd be a serious problem for something like a SQL
script to be cross-version-compatible if we reverse the meanings of the
existing operators.

AFAIK all the operators in question exist only in GIST opclasses, so one
possible solution to the multiple-operators-per-slot problem is to
extend the opclasses --- ie, teach the gist_consistent methods to
support two different strategy numbers that do the same thing. Ugly
and tedious, but it'd preserve backward compatibility.

regards, tom lane

#6Noname
jreich@root.net
In reply to: Tom Lane (#5)
Re: @ versus ~, redux

I also vote +1 for #3. Not only are there too many users, but simply
switching the sense of these operators will mean that code will still run,
but give incorrect answers and while it would be nice to think that all
client code has decent regression testing, this ain't the case.

If we are going to fix things so that all packages use the same sense, we
should slowly deprecate the current notation, and outright drop it for 8.2
or 8.3. What is the concensus: do it this release or next?

I also like the '@<' and '@>' notation as this gives a clear visual cue.

Josh Reich

Show quoted text

Oleg Bartunov <oleg@sai.msu.su> writes:

3. Leave the existing op names as-is in core and contrib, but consider
them deprecated and add new ops with consistently-chosen names.
(The new ops introduced by GIN should only exist with the new names.)

#3 looks good to me. Too many users.

Not only that, but it'd be a serious problem for something like a SQL
script to be cross-version-compatible if we reverse the meanings of the
existing operators.

AFAIK all the operators in question exist only in GIST opclasses, so one
possible solution to the multiple-operators-per-slot problem is to
extend the opclasses --- ie, teach the gist_consistent methods to
support two different strategy numbers that do the same thing. Ugly
and tedious, but it'd preserve backward compatibility.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

#7Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#1)
Re: @ versus ~, redux

Tom Lane wrote:

Awhile back I complained that while all the core geometric datatypes
use operator @ to mean "contained in" and operator ~ to mean "contains",
contrib/cube and contrib/seg switch the meanings:
http://archives.postgresql.org/pgsql-hackers/2005-06/msg01238.php

There was some followup discussion generally agreeing that we ought to
get these things in sync, but then Andrew@supernews threw a monkey
wrench into the proceedings by suggesting we change to different names
entirely:
http://archives.postgresql.org/pgsql-hackers/2005-06/msg01263.php
That is not necessarily a bad idea, but I didn't want to get drawn
into a debate about exactly what alternative names to adopt, so I
dropped the problem for the time being.

I now find that the GIN patch has propagated the contrib meanings
of these operators into the core:
http://archives.postgresql.org/pgsql-general/2006-09/msg00087.php
and at this point I'm going to put my foot down and insist that
we do *something*. I won't hold still for fundamentally backward
meanings of the same operator name within the core datatypes.

I can see various things that we might consider doing:

1. Just flip the names of the two operators added by the GIN patch.

2. #1 plus flip the names of the various contrib operators that are
out of sync (Michael Fuhr points out that contrib/intarray is out
of step too ... are there others?).

3. Leave the existing op names as-is in core and contrib, but consider
them deprecated and add new ops with consistently-chosen names.
(The new ops introduced by GIN should only exist with the new names.)

#1 isn't doing anything towards solving the underlying problem.
#2 has got obvious backwards-compatibility issues for contrib users.
#3 may or may not be technically feasible (I'm not sure if we can
support multiple operators occupying the same slot in an opclass),
besides which choosing the names to use could degenerate to a flamewar.

Thoughts, votes, better ideas? The only option I'm *not* open to is
leaving HEAD as it stands.

How about?:

4. do 1+3, i.e. flip the GIN operators to keep core consistency, but
deprecate the operators for both contrib and core. Something more
visually like set ops would be ideal.

cheers

andrew

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#7)
Re: @ versus ~, redux

Andrew Dunstan <andrew@dunslane.net> writes:

How about?:
4. do 1+3, i.e. flip the GIN operators to keep core consistency, but
deprecate the operators for both contrib and core. Something more
visually like set ops would be ideal.

If we're going to adopt new preferred names, I see no reason to support
the old confusing names for operators that have never existed before
8.2. There is no backward-compatibility argument to be made there.

regards, tom lane

#9Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tom Lane (#1)
Re: @ versus ~, redux

Tom Lane wrote:

I can see various things that we might consider doing:

1. Just flip the names of the two operators added by the GIN patch.

2. #1 plus flip the names of the various contrib operators that are
out of sync (Michael Fuhr points out that contrib/intarray is out
of step too ... are there others?).

3. Leave the existing op names as-is in core and contrib, but consider
them deprecated and add new ops with consistently-chosen names.
(The new ops introduced by GIN should only exist with the new names.)

#1 isn't doing anything towards solving the underlying problem.
#2 has got obvious backwards-compatibility issues for contrib users.
#3 may or may not be technically feasible (I'm not sure if we can
support multiple operators occupying the same slot in an opclass),
besides which choosing the names to use could degenerate to a flamewar.

I suggest: #4 Standardize on new names and completely drop old naming
scheme, both in core and in contrib.

#2 is much too dangerous, because people may not recognize that their
code needs updating. #3 introduces new code in core that has no other
legitimate purpose (or does someone see a reason why this is generally
useful?)

#4 would force people to notice that their code needs updating, which is
far safer than hoping people will notice.

mark

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Dilger (#9)
Re: @ versus ~, redux

Mark Dilger <pgsql@markdilger.com> writes:

I suggest: #4 Standardize on new names and completely drop old naming
scheme, both in core and in contrib.

Deliberately breaking code that has always worked doesn't sound very
appetizing to me. If there were simply no good alternative to it, then
maybe, but generally we have higher regard for backwards compatibility
than to do it just because it's neater.

I agree with planning to arrive at state #4 after a transitional release
or three, but to do it now with no warning will simply bring us visits
from angry pitchfork-bearing villagers...

regards, tom lane

#11Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#8)
Re: @ versus ~, redux

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

How about?:
4. do 1+3, i.e. flip the GIN operators to keep core consistency, but
deprecate the operators for both contrib and core. Something more
visually like set ops would be ideal.

If we're going to adopt new preferred names, I see no reason to support
the old confusing names for operators that have never existed before
8.2. There is no backward-compatibility argument to be made there.

You're right. I misread your original proposal. I vote for #3.

cheers

andrew

#12Chris Browne
cbbrowne@acm.org
In reply to: Tom Lane (#1)
Re: @ versus ~, redux

tgl@sss.pgh.pa.us (Tom Lane) wrote:

I agree with planning to arrive at state #4 after a transitional release
or three, but to do it now with no warning will simply bring us visits
from angry pitchfork-bearing villagers...

But then we can send out Trogdor...

Trogdor!
Trogdor!

Burninating the countryside,
Burninating the peasants,
Burninating all the people and the thatched-roof cottages
Thatched-roof cottages!

And the Trogdor comes in the NNNNNNNNIIIIIIIIIIIIIIGGGHHHHHHHHHHHHHHHHHHHHH!!!
<http://en.wikipedia.org/wiki/Trogdor&gt;

Sorry, but there's something about fighting pitchforks with fire...
--
select 'cbbrowne' || '@' || 'acm.org';
http://cbbrowne.com/info/postgresql.html
"Once you accept that the world is a giant computer run by white mice,
all other movies fade into insignificance." -- Mutsumi Takahashi

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#11)
Re: @ versus ~, redux

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

3. Leave the existing op names as-is in core and contrib, but consider
them deprecated and add new ops with consistently-chosen names.
(The new ops introduced by GIN should only exist with the new names.)

You're right. I misread your original proposal. I vote for #3.

OK, so if everyone is leaning to #3, the name game remains to be played.
Do we all agree on this:

"x @> y" means "x contains y"
"x @< y" means "x is contained in y"

Are we all prepared to sign a solemn oath to commit hara-kiri if we
invent a new datatype that gets this wrong? No? Maybe these still
aren't obvious enough.

BTW, even with the gist_consistent hack there's still a bit of a
technical problem: pg_operator can represent the knowledge that @> and
@< are commutators, and that @ and ~ are commutators, but not (at the
same time) that @> and @ are commutators. This is not a fatal objection
but it's a tad annoying --- I think there are cases where the planner
would miss possible optimizations if it can't see this. Anybody see a
suitably low-cost fix? Does it not matter if every GIST opclass has
mappings for both operator pairs?

regards, tom lane

#14Michael Glaesemann
grzm@seespotcode.net
In reply to: Tom Lane (#13)
Re: @ versus ~, redux

On Sep 4, 2006, at 12:44 , Tom Lane wrote:

OK, so if everyone is leaning to #3, the name game remains to be
played.
Do we all agree on this:

"x @> y" means "x contains y"
"x @< y" means "x is contained in y"

Are we all prepared to sign a solemn oath to commit hara-kiri if we
invent a new datatype that gets this wrong? No? Maybe these still
aren't obvious enough.

When I've been working on range/interval stuff, I tried to come up
with a self-consistent set of operator symbols for the Allen
operators, which includes the "contains" and "is contained in" pair.
Here's what I came up with.

Where r1 and r2 are ranges

r1 >> r2 r1 is strictly during r2, i.e., r1 is a strict subset of r2
r1 << r2 r2 is strictly during r1, i.e., r2 is a strict subset of r1
<< and >> are meant to evoke the (strict) subset (⊂ or &sub;) and
superset (⊃ or &sup;) operators.

r1 <<= r2 r1 is a superset of r2
r1 =>> r2 r1 is a subset of r2

<<= and =>> are mean to evoke the subset (⊆ or &sube;) and superset
(⊇ or &supe;) operators.

Assuming the meaning of contains and is contained in is inclusive
(rather than strict), then we'd have

a <<= b : a contains b
a =>> b : a is contained by b

I've included the other Allen operators at the bottom for completeness.

Michael Glaesemann
grzm seespotcode net

r1 = r2 r1 equals r2
r1 <> r2 r1 does not equal r2
For the following, the < or > indicates the relative position of the
two ranges if they were depicted on an line that increases from left
to right.

r1 <| r2 r1 strictly meets r2, i.e., begin(r2) is next(end(r1))
r1 |> r2 r2 strictly meets r2, i.e., begin(r1) is next(end(r2))
The | is meant to evoke the meeting point of r1 and r2. They don't
overlap, they are just abutting. The < or > "points" to the direction
the of the range it points to relative to the other range, i.e., r1
is to the left of r2 on an line that increases from left to right.

r1 </ r2 r1 is before r2
r1 /> r2 r1 is after r2
The / is meant to evoke the fact that they are not abutting.

r1 <& r2 r1 strictly overlaps r2
r1 &> r2 r2 strictly overlaps r1
The & is meant to evoke "and", in that there is something the two
ranges share.

r1 @< r2 r1 starts r2
r1 @> r2 r2 starts r1
r1 >@ r2 r1 finishes r2
r1 <@ r2 r2 finishes r1
The @ is meant to indicate the point where the two ranges share a
begin or end point. E.g., for r1 @< r2, r1 and r2 start together, and
end(r1) < end(r2). For r1 <@ r2, begin(r1) < begin(r2), but they
share the same end point.

#15Matteo Beccati
php@beccati.com
In reply to: Tom Lane (#13)
Re: @ versus ~, redux

Tom Lane ha scritto:

OK, so if everyone is leaning to #3, the name game remains to be played.
Do we all agree on this:

"x @> y" means "x contains y"
"x @< y" means "x is contained in y"

Are we all prepared to sign a solemn oath to commit hara-kiri if we
invent a new datatype that gets this wrong? No? Maybe these still
aren't obvious enough.

Does this mean that also contrib/ltree operators will likely change for
consistency?

ltree @> ltree
- returns TRUE if left argument is an ancestor of right argument
(or equal).
ltree <@ ltree
- returns TRUE if left argument is a descendant of right argument
(or equal).

Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

#16Andrew - Supernews
andrew+nonews@supernews.com
In reply to: Tom Lane (#1)
Re: @ versus ~, redux

On 2006-09-04, Tom Lane <tgl@sss.pgh.pa.us> wrote:

OK, so if everyone is leaning to #3, the name game remains to be played.
Do we all agree on this:

"x @> y" means "x contains y"
"x @< y" means "x is contained in y"

While I suggested something like those, I would also suggest that the
existing operators for inet/cidr be taken into consideration:

x >>= y "x contains y"
x >> y "x strictly contains y"
x <<= y "x is contained in y"
x << y "x is strictly contained in y"

(obviously these don't all necessarily make sense for all types)

These have the advantage of resembling set notation more closely and being
in use in one existing core type.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

#17Bruce Momjian
bruce@momjian.us
In reply to: Matteo Beccati (#15)
Re: @ versus ~, redux

Matteo Beccati <php@beccati.com> writes:

Tom Lane ha scritto:

OK, so if everyone is leaning to #3, the name game remains to be played.
Do we all agree on this:

"x @> y" means "x contains y"
"x @< y" means "x is contained in y"

Are we all prepared to sign a solemn oath to commit hara-kiri if we
invent a new datatype that gets this wrong? No? Maybe these still
aren't obvious enough.

Does this mean that also contrib/ltree operators will likely change for
consistency?

ltree @> ltree
- returns TRUE if left argument is an ancestor of right argument (or
equal).
ltree <@ ltree
- returns TRUE if left argument is a descendant of right argument (or
equal).

If you consider ltree entries to be sets containing all their children then
those sound consistent.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

#18Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#17)
Re: @ versus ~, redux

Gregory Stark <stark@enterprisedb.com> writes:

Matteo Beccati <php@beccati.com> writes:

Tom Lane ha scritto:

"x @< y" means "x is contained in y"

ltree <@ ltree

If you consider ltree entries to be sets containing all their children then
those sound consistent.

Oops, sorry for the noise.

--
greg

#19Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Bruce Momjian (#18)
Re: @ versus ~, redux

"x @< y" means "x is contained in y"

ltree <@ ltree

If you consider ltree entries to be sets containing all their

children

then those sound consistent.

Now we get to decide whether "<@" was better than the now proposed "@<"
:-)
I like <@. (or we stay clear by using the inet ops)

Andreas

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matteo Beccati (#15)
Re: @ versus ~, redux

Matteo Beccati <php@beccati.com> writes:

Tom Lane ha scritto:

OK, so if everyone is leaning to #3, the name game remains to be played.
Do we all agree on this:

"x @> y" means "x contains y"
"x @< y" means "x is contained in y"

Does this mean that also contrib/ltree operators will likely change for
consistency?

Oh, I hadn't noticed that ltree spells it "<@" rather than "@<". I'd be
inclined to stick with the ltree precedent.

regards, tom lane

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Glaesemann (#14)
#22Matteo Beccati
php@beccati.com
In reply to: Tom Lane (#20)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew - Supernews (#16)
#24Michael Glaesemann
grzm@seespotcode.net
In reply to: Tom Lane (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Glaesemann (#24)
#26Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#23)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#26)
#28Zeugswetter Andreas SB SD
ZeugswetterA@spardat.at
In reply to: Tom Lane (#27)
#29Matteo Beccati
php@beccati.com
In reply to: Tom Lane (#23)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Zeugswetter Andreas SB SD (#28)
#31Bruce Momjian
bruce@momjian.us
In reply to: Zeugswetter Andreas SB SD (#28)
#32Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#27)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#32)
#34Michael Glaesemann
grzm@seespotcode.net
In reply to: Tom Lane (#33)