point types in "DISTINCT" queries

Started by Jonathan S. Katzalmost 15 years ago8 messagesgeneral
Jump to latest
#1Jonathan S. Katz
jonathan.katz@excoventures.com

Hi,

I am running PostgreSQL 9.0.4 and I am getting an error with a SELECT DISTINCT query that contains a point type in the SELECT clause. To be more specific, a query such as:

-- explicit declaration that it's a point type
SELECT DISTINCT a.geocode::point
FROM a
WHERE a.region = 'x';

Will return the error:

ERROR: could not identify an equality operator for type point

I read the notes about how point types do not have "=" defined for them, but "~=" aka the "same as" operator (http://www.postgresql.org/docs/9.0/static/functions-geometry.html). For points, I would treat ~= as equality. I tried creating my own equality operator based on that:

CREATE OR REPLACE FUNCTION point_equality(point, point) RETURNS bool
AS 'SELECT $1 ~= $2;'
LANGUAGE SQL;

CREATE OPERATOR = (
LEFTARG = point,
RIGHTARG = point,
PROCEDURE = point_equality,
COMMUTATOR = =
);

And when I ran the query again:

ERROR: could not identify an equality operator for type point

I looked into the mailing list archives and found a potential answer on this thread: http://archives.postgresql.org/pgsql-general/2009-10/msg01122.php However I wanted to see if it was still necessary that I would need the complete btree operator class to run such a query. Are there plans to have a defined "=" operator on the point type? I can understand how the other geometric types, "=" would represent area, but AFAIK I think "=" could be safely applied on a point type (and i realize I could submit a patch for that :-) maybe depending on the resolution to this / refreshing my C...).

Is there possibly a relatively quick solution to this issue?

Thanks!

Jonathan

#2Jeff Davis
pgsql@j-davis.com
In reply to: Jonathan S. Katz (#1)
Re: point types in "DISTINCT" queries

On Tue, 2011-06-28 at 18:56 -0400, Jonathan S. Katz wrote:

I looked into the mailing list archives and found a potential answer
on this thread:
http://archives.postgresql.org/pgsql-general/2009-10/msg01122.php
However I wanted to see if it was still necessary that I would need
the complete btree operator class to run such a query.

Yes, the default btree operator class is used to find the equality
operator. Even though you have defined the operator "=", postgresql
doesn't rely on that meaning "equals" -- the btree operator class is
what imparts that meaning.

Are there plans to have a defined "=" operator on the point type? I
can understand how the other geometric types, "=" would represent
area, but AFAIK I think "=" could be safely applied on a point type
(and i realize I could submit a patch for that :-) maybe depending on
the resolution to this / refreshing my C...).

The built-in geometric types haven't received a lot of attention lately.
Most people who use geometric data use the PostGIS extension, which is a
sophisticated extension that can deal with that kind of data. You might
want to check that out and see if it meets your needs.

Perhaps someone is interested in bringing the built-in geometric types
up to speed; but I think most of the interest is moving things like this
out to extensions where they can be more easily be maintained by
interested parties.

If you'd like to submit a patch, I suggest first asking on -hackers
whether improvements to the built-in spatial types would be accepted.

Regards,
Jeff Davis

#3Magnus Hagander
magnus@hagander.net
In reply to: Jeff Davis (#2)
Re: point types in "DISTINCT" queries

On Wed, Jun 29, 2011 at 06:53, Jeff Davis <pgsql@j-davis.com> wrote:

On Tue, 2011-06-28 at 18:56 -0400, Jonathan S. Katz wrote:

I looked into the mailing list archives and found a potential answer
on this thread:
http://archives.postgresql.org/pgsql-general/2009-10/msg01122.php
However I wanted to see if it was still necessary that I would need
the complete btree operator class to run such a query.

Yes, the default btree operator class is used to find the equality
operator. Even though you have defined the operator "=", postgresql
doesn't rely on that meaning "equals" -- the btree operator class is
what imparts that meaning.

Are there plans to have a defined "=" operator on the point type?  I
can understand how the other geometric types, "=" would represent
area, but AFAIK I think "=" could be safely applied on a point type
(and i realize I could submit a patch for that :-) maybe depending on
the resolution to this / refreshing my C...).

The built-in geometric types haven't received a lot of attention lately.
Most people who use geometric data use the PostGIS extension, which is a
sophisticated extension that can deal with that kind of data. You might
want to check that out and see if it meets your needs.

Perhaps someone is interested in bringing the built-in geometric types
up to speed; but I think most of the interest is moving things like this
out to extensions where they can be more easily be maintained by
interested parties.

Given that they are the only ones supporting knn-gist, I would expect
them to actually become *more* popular with 9.1 - at least until such
time as postgis adds support for it...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#4Jonathan S. Katz
jonathan.katz@excoventures.com
In reply to: Magnus Hagander (#3)
Re: point types in "DISTINCT" queries

On Jun 29, 2011, at 10:25 AM, Magnus Hagander <magnus@hagander.net>
wrote:

On Wed, Jun 29, 2011 at 06:53, Jeff Davis <pgsql@j-davis.com> wrote:

On Tue, 2011-06-28 at 18:56 -0400, Jonathan S. Katz wrote:

I looked into the mailing list archives and found a potential answer
on this thread:
http://archives.postgresql.org/pgsql-general/2009-10/msg01122.php
However I wanted to see if it was still necessary that I would need
the complete btree operator class to run such a query.

Yes, the default btree operator class is used to find the equality
operator. Even though you have defined the operator "=", postgresql
doesn't rely on that meaning "equals" -- the btree operator class is
what imparts that meaning.

Are there plans to have a defined "=" operator on the point type? I
can understand how the other geometric types, "=" would represent
area, but AFAIK I think "=" could be safely applied on a point type
(and i realize I could submit a patch for that :-) maybe depending
on
the resolution to this / refreshing my C...).

The built-in geometric types haven't received a lot of attention
lately.
Most people who use geometric data use the PostGIS extension, which
is a
sophisticated extension that can deal with that kind of data. You
might
want to check that out and see if it meets your needs.

Perhaps someone is interested in bringing the built-in geometric
types
up to speed; but I think most of the interest is moving things like
this
out to extensions where they can be more easily be maintained by
interested parties.

Given that they are the only ones supporting knn-gist, I would expect
them to actually become *more* popular with 9.1 - at least until such
time as postgis adds support for it...

In fact that is my use-case - I will be performing nearest-neighbor
lookups (and will be running 9.1b2 on this data set shortly).
However, because most of the geospatial work is relatively
straightforward, I didn't want to use PostGIS for this application.
But that might change in the near future depending on the requirements.

But for now tasks like ensuing uniqueness amongst points are slightly
more difficult. My current solution is breaking out the (x,y) coords
into different columns

Jonathan

#5Magnus Hagander
magnus@hagander.net
In reply to: Jonathan S. Katz (#4)
Re: point types in "DISTINCT" queries

On Wed, Jun 29, 2011 at 16:38, Jonathan S. Katz
<jonathan.katz@excoventures.com> wrote:

On Jun 29, 2011, at 10:25 AM, Magnus Hagander <magnus@hagander.net> wrote:

On Wed, Jun 29, 2011 at 06:53, Jeff Davis <pgsql@j-davis.com> wrote:

On Tue, 2011-06-28 at 18:56 -0400, Jonathan S. Katz wrote:

I looked into the mailing list archives and found a potential answer
on this thread:
http://archives.postgresql.org/pgsql-general/2009-10/msg01122.php
However I wanted to see if it was still necessary that I would need
the complete btree operator class to run such a query.

Yes, the default btree operator class is used to find the equality
operator. Even though you have defined the operator "=", postgresql
doesn't rely on that meaning "equals" -- the btree operator class is
what imparts that meaning.

Are there plans to have a defined "=" operator on the point type?  I
can understand how the other geometric types, "=" would represent
area, but AFAIK I think "=" could be safely applied on a point type
(and i realize I could submit a patch for that :-) maybe depending on
the resolution to this / refreshing my C...).

The built-in geometric types haven't received a lot of attention lately.
Most people who use geometric data use the PostGIS extension, which is a
sophisticated extension that can deal with that kind of data. You might
want to check that out and see if it meets your needs.

Perhaps someone is interested in bringing the built-in geometric types
up to speed; but I think most of the interest is moving things like this
out to extensions where they can be more easily be maintained by
interested parties.

Given that they are the only ones supporting knn-gist, I would expect
them to actually become *more* popular with 9.1 - at least until such
time as postgis adds support for it...

In fact that is my use-case - I will be performing nearest-neighbor lookups
(and will be running 9.1b2 on this data set shortly).  However, because most
of the geospatial work is relatively straightforward, I didn't want to use
PostGIS for this application.  But that might change in the near future
depending on the requirements.

But for now tasks like ensuing uniqueness amongst points are slightly more
difficult.   My current solution is breaking out the (x,y) coords into
different columns

Have you tried using an exclusion constraint? Not entirely sure, but I
think that might work.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#6Jonathan S. Katz
jonathan.katz@excoventures.com
In reply to: Magnus Hagander (#5)
Re: point types in "DISTINCT" queries

On Jun 29, 2011, at 10:42 AM, Magnus Hagander wrote:

On Wed, Jun 29, 2011 at 16:38, Jonathan S. Katz
<jonathan.katz@excoventures.com> wrote:

In fact that is my use-case - I will be performing nearest-neighbor lookups
(and will be running 9.1b2 on this data set shortly). However, because most
of the geospatial work is relatively straightforward, I didn't want to use
PostGIS for this application. But that might change in the near future
depending on the requirements.

But for now tasks like ensuing uniqueness amongst points are slightly more
difficult. My current solution is breaking out the (x,y) coords into
different columns

Have you tried using an exclusion constraint? Not entirely sure, but I
think that might work.

Did a quick experiment:

Using =~

ALTER TABLE a ADD EXCLUDE USING gist (geocode WITH ~=);

Results:

ERROR: could not create exclusion constraint "a_geocode_excl"
DETAIL: Key (geocode)=((33.8367126,-117.9164627)) conflicts with key (geocode)=((33.8367128,-117.9164627)).

Which means it *should* work, but first I would need to clean up the data and find the duplicates. I was hoping this might work:

SELECT geocode, count(*)
FROM a
GROUP BY a.geocode
HAVING count(*) > 1;

But:

ERROR: could not identify an equality operator for type point

So I would have to just find the points one-by-one until the exclusion constraint passes.

Now, using the custom = operator:

ALTER TABLE app_address ADD EXCLUDE USING gist (geocode WITH =);

Results:

ERROR: operator =(point,point) is not a member of operator family "point_ops"
DETAIL: The exclusion operator must be related to the index operator class for the constraint.

Jonathan

#7Jeff Davis
pgsql@j-davis.com
In reply to: Jonathan S. Katz (#6)
Re: point types in "DISTINCT" queries

On Wed, 2011-06-29 at 11:37 -0400, Jonathan S. Katz wrote:

Which means it *should* work, but first I would need to clean up the data and find the duplicates. I was hoping this might work:

SELECT geocode, count(*)
FROM a
GROUP BY a.geocode
HAVING count(*) > 1;

Maybe you could use a self-join as a workaround for now, just to clean
up the data?

SELECT geocode, other_columns from a a1, a a2 where a1.other_columns <>
a2.other_columns and a1.geocode ~= a2.geocode;

Regards,
Jeff Davis

#8Jonathan S. Katz
jonathan.katz@excoventures.com
In reply to: Jeff Davis (#7)
Re: point types in "DISTINCT" queries

On Wed, 2011-06-29 at 11:37 -0400, Jonathan S. Katz wrote:

Which means it *should* work, but first I would need to clean up the data and find the duplicates. I was hoping this might work:

SELECT geocode, count(*)
FROM a
GROUP BY a.geocode
HAVING count(*) > 1;

Maybe you could use a self-join as a workaround for now, just to clean
up the data?

SELECT geocode, other_columns from a a1, a a2 where a1.other_columns <>
a2.other_columns and a1.geocode ~= a2.geocode;

That worked perfectly - turned out it was just two rows. And subsequently executing the exclusion constraint on "=~" also worked perfectly as expected.

The larger issue I face with now is slightly out of my control without further hacking. I'm developing an app with Django and I wrote an extension that allows me to use the point type natively in Python. I ran into the original issue while an automatically generated query was executed in the admin section. I know this could be viewed as something pertaining to Django, but the goal I had in mind was making PostgreSQL functionality more accessible in a different software layer.

I will find a workaround for the above, as I am sure I can do some application-level hacking.

Thanks for your help!

Jonathan