Idea on how to simplify comparing two sets

Started by Joel Jacobsonabout 9 years ago24 messageshackers

Jump to latest

Joel Jacobson

joel@trustly.com

about 9 years ago

Hi hackers,

Currently there is no simple way to check if two sets are equal.

Looks like no RDBMS in the world has a simple command for it.

You have to do something like:

WITH
T1 AS (SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000),
T2 AS (SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000)
SELECT
GREATEST(
(SELECT COUNT(*) FROM T1),
(SELECT COUNT(*) FROM T2)
)
=
(SELECT COUNT(*) FROM (
SELECT * FROM T1
INTERSECT ALL
SELECT * FROM T2
) AS X)
INTO _Identical;

or,

SELECT 'Missmatch!' WHERE EXISTS (
SELECT * FROM Foo
FULL JOIN Bar ON (Foo.FooID = Bar.BarID AND
Foo IS NOT DISTINCT FROM Bar)
WHERE TRUE
AND ( Foo.FooID BETWEEN 1 AND 10000 AND
Bar.BarID BETWEEN 1 AND 10000 )
AND ( Foo.FooID IS NULL OR
Bar.BarID IS NULL);

Introducing new SQL keywords is of course not an option,
since it would possibly break backwards compatibility.

So here is an idea that doesn't break backwards compatibility:

Let's give a meaning for the existing IS DISTINCT and IS NOT DISTINCT,
that is currently a syntax error when used between two sets.

SELECT 1 IS DISTINCT FROM SELECT 1;
ERROR: syntax error at or near "SELECT"
LINE 1: SELECT 1 IS DISTINCT FROM SELECT 1;

The example above could be written as:

_Identical := (
SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000
IS NOT DISTINCT FROM
SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000
);

Which would set _Identical to TRUE if the two sets are equal,
and FALSE otherwise.

Since it's currently a syntax error, there is no risk for changed
behaviour for any existing executable queries.

Thoughts?

/Joel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Anders Granlund

anders.granlund@trustly.com

about 9 years ago

In reply to: Joel Jacobson (#1)

Re: Idea on how to simplify comparing two sets

What about this ambiguity?

SELECT TRUE
WHERE FALSE
IS NOT DISTINCT FROM
(SELECT TRUE)

On Tue, Feb 7, 2017 at 4:13 PM, Joel Jacobson <joel@trustly.com> wrote:

Show quoted text

Hi hackers,

Currently there is no simple way to check if two sets are equal.

Looks like no RDBMS in the world has a simple command for it.

You have to do something like:

WITH
T1 AS (SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000),
T2 AS (SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000)
SELECT
GREATEST(
(SELECT COUNT(*) FROM T1),
(SELECT COUNT(*) FROM T2)
)
=
(SELECT COUNT(*) FROM (
SELECT * FROM T1
INTERSECT ALL
SELECT * FROM T2
) AS X)
INTO _Identical;

or,

SELECT 'Missmatch!' WHERE EXISTS (
SELECT * FROM Foo
FULL JOIN Bar ON (Foo.FooID = Bar.BarID AND
Foo IS NOT DISTINCT FROM Bar)
WHERE TRUE
AND ( Foo.FooID BETWEEN 1 AND 10000 AND
Bar.BarID BETWEEN 1 AND 10000 )
AND ( Foo.FooID IS NULL OR
Bar.BarID IS NULL);

Introducing new SQL keywords is of course not an option,
since it would possibly break backwards compatibility.

So here is an idea that doesn't break backwards compatibility:

Let's give a meaning for the existing IS DISTINCT and IS NOT DISTINCT,
that is currently a syntax error when used between two sets.

SELECT 1 IS DISTINCT FROM SELECT 1;
ERROR: syntax error at or near "SELECT"
LINE 1: SELECT 1 IS DISTINCT FROM SELECT 1;

The example above could be written as:

_Identical := (
SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000
IS NOT DISTINCT FROM
SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000
);

Which would set _Identical to TRUE if the two sets are equal,
and FALSE otherwise.

Since it's currently a syntax error, there is no risk for changed
behaviour for any existing executable queries.

Thoughts?

/Joel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Joel Jacobson

joel@trustly.com

about 9 years ago

In reply to: Anders Granlund (#2)

Re: Idea on how to simplify comparing two sets

But that's already a valid statement, so there is no ambiguity:

SELECT TRUE WHERE FALSE
IS NOT DISTINCT FROM
(SELECT TRUE);
bool
------
(0 rows)

If you want to compare the set (SELECT TRUE WHERE FALSE) with the set
(SELECT TRUE) then just add parenthesis:
(SELECT TRUE WHERE FALSE)
IS NOT DISTINCT FROM
(SELECT TRUE);
ERROR: syntax error at or near "IS"
LINE 2: IS NOT DISTINCT FROM
^

Which is currently invalid syntax.

On Tue, Feb 7, 2017 at 4:40 PM, Anders Granlund
<anders.granlund@trustly.com> wrote:

What about this ambiguity?

SELECT TRUE
WHERE FALSE
IS NOT DISTINCT FROM
(SELECT TRUE)

On Tue, Feb 7, 2017 at 4:13 PM, Joel Jacobson <joel@trustly.com> wrote:

Hi hackers,

Currently there is no simple way to check if two sets are equal.

Looks like no RDBMS in the world has a simple command for it.

You have to do something like:

WITH
T1 AS (SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000),
T2 AS (SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000)
SELECT
GREATEST(
(SELECT COUNT(*) FROM T1),
(SELECT COUNT(*) FROM T2)
)
=
(SELECT COUNT(*) FROM (
SELECT * FROM T1
INTERSECT ALL
SELECT * FROM T2
) AS X)
INTO _Identical;

or,

SELECT 'Missmatch!' WHERE EXISTS (
SELECT * FROM Foo
FULL JOIN Bar ON (Foo.FooID = Bar.BarID AND
Foo IS NOT DISTINCT FROM Bar)
WHERE TRUE
AND ( Foo.FooID BETWEEN 1 AND 10000 AND
Bar.BarID BETWEEN 1 AND 10000 )
AND ( Foo.FooID IS NULL OR
Bar.BarID IS NULL);

Introducing new SQL keywords is of course not an option,
since it would possibly break backwards compatibility.

So here is an idea that doesn't break backwards compatibility:

Let's give a meaning for the existing IS DISTINCT and IS NOT DISTINCT,
that is currently a syntax error when used between two sets.

SELECT 1 IS DISTINCT FROM SELECT 1;
ERROR: syntax error at or near "SELECT"
LINE 1: SELECT 1 IS DISTINCT FROM SELECT 1;

The example above could be written as:

_Identical := (
SELECT * FROM Foo WHERE FooID BETWEEN 1 AND 10000
IS NOT DISTINCT FROM
SELECT * FROM Bar WHERE BarID BETWEEN 1 AND 10000
);

Which would set _Identical to TRUE if the two sets are equal,
and FALSE otherwise.

Since it's currently a syntax error, there is no risk for changed
behaviour for any existing executable queries.

Thoughts?

/Joel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Joel Jacobson

Mobile: +46703603801
Trustly.com | Newsroom | LinkedIn | Twitter

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Joel Jacobson (#1)

Re: Idea on how to simplify comparing two sets

Joel Jacobson <joel@trustly.com> writes:

Currently there is no simple way to check if two sets are equal.

Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
and SELECT set2 EXCEPT SELECT set1 are both empty?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Joel Jacobson

joel@trustly.com

about 9 years ago

In reply to: Tom Lane (#4)

Re: Idea on how to simplify comparing two sets

On Tue, Feb 7, 2017 at 4:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Joel Jacobson <joel@trustly.com> writes:

Currently there is no simple way to check if two sets are equal.

Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
and SELECT set2 EXCEPT SELECT set1 are both empty?

Yes, that's one way, but it's ugly as you have to repeat yourself and
write both sets two times.
Not an issue for small queries, but if you have two big queries stored
in a .sql file,
you would have to modify both places for each query and always make
sure they are identical.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Fetter

david@fetter.org

about 9 years ago

In reply to: Joel Jacobson (#1)

Re: Idea on how to simplify comparing two sets

On Tue, Feb 07, 2017 at 04:13:40PM +0100, Joel Jacobson wrote:

Hi hackers,

Currently there is no simple way to check if two sets are equal.

Assuming that a and b each has at least one NOT NULL column, is this
simple enough? Based on nothing much, I'm assuming here that the IS
NOT NULL test is faster than IS NULL, but you can flip that and change
the array to {0} with identical effect.

WITH t AS (
SELECT a AS a, b AS b, (a IS NOT NULL)::int + (b IS NOT NULL)::int AS ind
FROM a FULL JOIN b ON ...
)
SELECT array_agg(DISTINCT ind) = '{2}'
FROM t;

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Fetter

david@fetter.org

about 9 years ago

In reply to: David Fetter (#6)

Re: Idea on how to simplify comparing two sets

On Tue, Feb 07, 2017 at 09:10:17AM -0800, David Fetter wrote:

On Tue, Feb 07, 2017 at 04:13:40PM +0100, Joel Jacobson wrote:

Hi hackers,

Currently there is no simple way to check if two sets are equal.

Assuming that a and b each has at least one NOT NULL column, is this
simple enough? Based on nothing much, I'm assuming here that the IS
NOT NULL test is faster than IS NULL, but you can flip that and change
the array to {0} with identical effect.

WITH t AS (
SELECT a AS a, b AS b, (a IS NOT NULL)::int + (b IS NOT NULL)::int AS ind
FROM a FULL JOIN b ON ...
)
SELECT array_agg(DISTINCT ind) = '{2}'
FROM t;

You don't actually need a and b in the inner target list.

WITH t AS (
SELECT (a IS NOT NULL)::int + (b IS NOT NULL)::int AS ind
FROM a FULL JOIN b ON ...
)
SELECT array_agg(DISTINCT ind) = '{2}'
FROM t;

This could be shortened further to the following if we ever implement
DISTINCT for window functions, which might involve implementing
DISTINCT via hashing more generally, which means hashable
types...whee!

SELECT array_agg(DISTINCT (a IS NOT NULL)::int + (b IS NOT NULL)::int) OVER () = '{2}'
FROM a FULL JOIN b ON ...

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Joel Jacobson

joel@trustly.com

about 9 years ago

In reply to: David Fetter (#7)

Re: Idea on how to simplify comparing two sets

On Tue, Feb 7, 2017 at 6:28 PM, David Fetter <david@fetter.org> wrote:

This could be shortened further to the following if we ever implement
DISTINCT for window functions, which might involve implementing
DISTINCT via hashing more generally, which means hashable
types...whee!

SELECT array_agg(DISTINCT (a IS NOT NULL)::int + (b IS NOT NULL)::int) OVER () = '{2}'
FROM a FULL JOIN b ON ...

That's still a lot more syntax than just adding "IS NOT DISTINCT FROM"
in between the sets.

I just thought the general approach at looking for ways to express new
things in SQL,
without introducing new keywords, but instead rely on existing keywords but
that currently are syntax errors when used in some semantic way,
is an interesting approach to allow extending the SQL syntax without
breaking backwards compatibility.

Are there any historical examples of when this approach has been used
to make progress in PostgreSQL?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David G. Johnston

david.g.johnston@gmail.com

about 9 years ago

In reply to: Tom Lane (#4)

Re: Idea on how to simplify comparing two sets

On Tue, Feb 7, 2017 at 8:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Joel Jacobson <joel@trustly.com> writes:

Currently there is no simple way to check if two sets are equal.

Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
and SELECT set2 EXCEPT SELECT set1 are both empty?

SELECT set1 FULL EXCEPT SELECT set2 ?

Matches with the concept and syntax of "FULL JOIN".

SELECT set1 XOR SELECT set2

That said I'm not sure how much we want to go down this road on our own.
It'd be nice to have when its needed but its not something that gets much
visibility on these lists to suggest a large pent-up demand.

IS NOT DISTINCT FROM doesn't imply bi-direction any better than EXCEPT does
... if we are going to add new syntax I'd say it should at least do that.

David J.

#10

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: David G. Johnston (#9)

Re: Idea on how to simplify comparing two sets

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Tue, Feb 7, 2017 at 8:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Joel Jacobson <joel@trustly.com> writes:

Currently there is no simple way to check if two sets are equal.

Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
and SELECT set2 EXCEPT SELECT set1 are both empty?

SELECT set1 FULL EXCEPT SELECT set2 ?
Matches with the concept and syntax of "FULL JOIN".

Actually ... now that you mention full join, I believe this works:

select * from (select ...) s1 full join (select ...) s2
on ((s1.*)=(s2.*)) where s1.* is distinct from s2.*;

That said I'm not sure how much we want to go down this road on our own.
It'd be nice to have when its needed but its not something that gets much
visibility on these lists to suggest a large pent-up demand.

Yeah, if this isn't in the standard and not in other databases either,
that would seem to suggest that it's not a big requirement.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Fabien COELHO

coelho@cri.ensmp.fr

about 9 years ago

In reply to: Joel Jacobson (#1)

Re: Idea on how to simplify comparing two sets

Currently there is no simple way to check if two sets are equal.

My 0.02€:

Assuming that you mean set = relation, and that there is a key (which
should be the case for a set otherwise tuples cannot be distinguished, so
this is not really a set), and assuming not null other data, then:

CREATE TABLE TAB1(k INT PRIMARY KEY, data TEXT NOT NULL);
INSERT INTO TAB1 VALUES (1, 'one'), (2, 'two'), (3, 'three');

CREATE TABLE TAB2(k INT PRIMARY KEY, data TEXT NOT NULL);
INSERT INTO TAB2 VALUES (1, 'one'), (2, 'deux'), (4, 'four');

The TAB1 to TAB2 difference is computed with:

SELECT
CASE WHEN t1.k IS NULL THEN 'INSERT'
WHEN t2.k IS NULL THEN 'DELETE'
ELSE 'UPDATE'
END AS operation,
COALESCE(t1.k, t2.k) AS key
FROM TAB1 AS t1
FULL JOIN TAB2 AS t2 USING (k)
WHERE
t1.data IS NULL OR t2.data IS NULL OR t1.data <> t2.data;

Results in:

UPDATE | 2
DELETE | 3
INSERT | 4

If there is no differences, then sets are equals...

If there is no associated data, a simpler condition:

WHERE t1.k IS NULL OR t2.k IS NULL;

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Merlin Moncure

mmoncure@gmail.com

about 9 years ago

In reply to: Joel Jacobson (#5)

Re: Idea on how to simplify comparing two sets

1On Tue, Feb 7, 2017 at 9:46 PM, Joel Jacobson <joel@trustly.com> wrote:

On Tue, Feb 7, 2017 at 4:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Joel Jacobson <joel@trustly.com> writes:

Currently there is no simple way to check if two sets are equal.

Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
and SELECT set2 EXCEPT SELECT set1 are both empty?

Yes, that's one way, but it's ugly as you have to repeat yourself and
write both sets two times.
Not an issue for small queries, but if you have two big queries stored
in a .sql file,
you would have to modify both places for each query and always make
sure they are identical.

A CTE might help:

WITH left AS (something complex),
right AS (something complex)
SELECT COUNT(*) = 0 AS good FROM
(
SELECT * FROM left EXCEPT SELECT * FROM right
UNION ALL
SELECT * FROM right EXCEPT SELECT * FROM left
) q;

This isn't the most efficient solution, but is easily abstracted into
dynamic SQL (meaning, you could pass both queries as arguments to a
checker function). Another, similar approach is to abstract the query
behind a view which ISTM is a practice you are underutilizing based on
your comments :-).

If I were in a hurry and the dataset was enormous I would probably
dump both queries identically ordered to a .csv, and do:
diff left.csv right.csv | head -1

in bash or something like that. Not sure if the utility of a
bidirectional EXCEPT is enough to justify adding custom syntax for
that approach. I use the 'double EXCEPT' tactic fairly often and
understand the need, but the bar for non-standard syntax is pretty
high (and has been getting higher over the years, I think).

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Pantelis Theodosiou

ypercube@gmail.com

about 9 years ago

In reply to: Tom Lane (#4)

Re: Idea on how to simplify comparing two sets

On Tue, Feb 7, 2017 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Joel Jacobson <joel@trustly.com> writes:

Currently there is no simple way to check if two sets are equal.

Uh ... maybe check whether SELECT set1 EXCEPT SELECT set2
and SELECT set2 EXCEPT SELECT set1 are both empty?

regards, tom lane

Yes, if the wanted result is true or false, something like this:

SELECT EXISTS (TABLE a EXCEPT TABLE b)
OR EXISTS (TABLE b EXCEPT TABLE a) ;

And if a new operator was added (in the same category as UNION and
EXCEPT), it could be:

SELECT EXISTS (TABLE a XORSET TABLE b) ;

What about using the = and <> operators in sets? Is the following
allowed in the standard?

SELECT (TABLE a) <> (TABLE b) ;

#14

Pantelis Theodosiou

ypercube@gmail.com

about 9 years ago

In reply to: Joel Jacobson (#1)

Re: Idea on how to simplify comparing two sets

On Tue, Feb 7, 2017 at 3:13 PM, Joel Jacobson <joel@trustly.com> wrote:

Hi hackers,

Currently there is no simple way to check if two sets are equal.

Looks like no RDBMS in the world has a simple command for it.

You have to do something like:

...

Introducing new SQL keywords is of course not an option,
since it would possibly break backwards compatibility.

I'm not advocating it but I don't see how introducing new SQL keywords
breaks backwards compatibility.

Pantelis Theodosiou

#15

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Pantelis Theodosiou (#14)

Re: Idea on how to simplify comparing two sets

On Wed, Feb 8, 2017 at 4:24 AM, Pantelis Theodosiou <ypercube@gmail.com> wrote:

I'm not advocating it but I don't see how introducing new SQL keywords
breaks backwards compatibility.

It does at least a little bit. This starts failing:

select 1 new_keyword form blah;

(now you have to insert AS or quote the keyword)

If the new keyword is partially or fully reserved, then anybody who is
using that column in a now-impermissible way has to change names of
conflicting tables, columns, functions, etc.

But of course we do add keywords in every release, where it's
warranted by the value of the new feature. I think the problem for
this proposed feature is not that adding new keywords is a completely
insane thing to do but that (1) there are lots of other good ways to
do more or less what is being requested with this new syntax, so it's
not clear that we need a new one and (2) there are cases where it
might be ambiguous which meaning of IS NOT DISTINCT FROM is met.
Joel's answer to #2 is to just prefer the existing meaning wherever
it's possible to assign that meaning and use the new meaning only in
cases where the existing meaning is impossible, but (2a) if you tried
to code it up, you'd probably find that it's quite difficult to make
bison generate a grammar that works that way, because bison isn't
designed around the idea of giving things a meaning only if they don't
already have one and (2b) apparently ambiguities can be confusing to
users even if they've been eliminated in the formal grammar.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Robert Haas (#15)

Re: Idea on how to simplify comparing two sets

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Feb 8, 2017 at 4:24 AM, Pantelis Theodosiou <ypercube@gmail.com> wrote:

I'm not advocating it but I don't see how introducing new SQL keywords
breaks backwards compatibility.

It does at least a little bit.

Yes. I think a new set-operation keyword would inevitably have to be
fully reserved --- UNION, INTERSECT, and EXCEPT all are --- which means
that you'd break every application that has used that word as a table,
column, or function name.

Generally speaking, we try very darn hard not to introduce new reserved
words that are not called out as reserved in the SQL standard. (And even
for those, we've sometimes made the grammar jump through hoops so as
not to reserve a word that we didn't reserve previously.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

David Fetter

david@fetter.org

about 9 years ago

In reply to: Tom Lane (#16)

Re: Idea on how to simplify comparing two sets

On Wed, Feb 08, 2017 at 11:22:56AM -0500, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Feb 8, 2017 at 4:24 AM, Pantelis Theodosiou <ypercube@gmail.com> wrote:

I'm not advocating it but I don't see how introducing new SQL keywords
breaks backwards compatibility.

It does at least a little bit.

Yes. I think a new set-operation keyword would inevitably have to
be fully reserved --- UNION, INTERSECT, and EXCEPT all are --- which
means that you'd break every application that has used that word as
a table, column, or function name.

I've long wanted a SYMMETRIC DIFFERENCE join type, that being the only
elementary set operation not included in join types, but nobody at the
SQL standards committee seems to have cared enough to help.

Generally speaking, we try very darn hard not to introduce new
reserved words that are not called out as reserved in the SQL
standard. (And even for those, we've sometimes made the grammar
jump through hoops so as not to reserve a word that we didn't
reserve previously.)

We just never know what new keywords the standards committee will
dream up, or what silliness they'll introduce in the grammar :/

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: David Fetter (#17)

Re: Idea on how to simplify comparing two sets

David Fetter <david@fetter.org> writes:

On Wed, Feb 08, 2017 at 11:22:56AM -0500, Tom Lane wrote:

Yes. I think a new set-operation keyword would inevitably have to
be fully reserved --- UNION, INTERSECT, and EXCEPT all are --- which
means that you'd break every application that has used that word as
a table, column, or function name.

I've long wanted a SYMMETRIC DIFFERENCE join type, that being the only
elementary set operation not included in join types, but nobody at the
SQL standards committee seems to have cared enough to help.

I wonder whether you could shoehorn it in with no new reserved word
by spelling it "EXCEPT SYMMETRIC", which could be justified by the
precedent of BETWEEN SYMMETRIC. But not sure what to do with
duplicate rows (ie, if LHS has two occurrences of row X and RHS
has one occurrence, do you output X?)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

David G. Johnston

david.g.johnston@gmail.com

about 9 years ago

In reply to: Tom Lane (#18)

Re: Idea on how to simplify comparing two sets

On Wed, Feb 8, 2017 at 10:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Fetter <david@fetter.org> writes:

On Wed, Feb 08, 2017 at 11:22:56AM -0500, Tom Lane wrote:

Yes. I think a new set-operation keyword would inevitably have to
be fully reserved --- UNION, INTERSECT, and EXCEPT all are --- which
means that you'd break every application that has used that word as
a table, column, or function name.

I've long wanted a SYMMETRIC DIFFERENCE join type, that being the only
elementary set operation not included in join types, but nobody at the
SQL standards committee seems to have cared enough to help.

I wonder whether you could shoehorn it in with no new reserved word
by spelling it "EXCEPT SYMMETRIC", which could be justified by the
precedent of BETWEEN SYMMETRIC. But not sure what to do with
duplicate rows (ie, if LHS has two occurrences of row X and RHS
has one occurrence, do you output X?)

Without SYMMETRIC its defined to return:

max(m-n,0)

with SYMMETRIC I'd think that would just change to:

abs(m-n)

Then you still have to apply ALL or DISTINCT on the above result.

David J.

#20

Jim Nasby

Jim.Nasby@BlueTreble.com

about 9 years ago

In reply to: Tom Lane (#10)

Re: Idea on how to simplify comparing two sets

On 2/7/17 12:03 PM, Tom Lane wrote:

That said I'm not sure how much we want to go down this road on our own.
It'd be nice to have when its needed but its not something that gets much
visibility on these lists to suggest a large pent-up demand.

Yeah, if this isn't in the standard and not in other databases either,
that would seem to suggest that it's not a big requirement.

FWIW I've found myself needing the precursor to this this (give me the
full diff) at least a couple times, and it's quite a PITA on anything
but a trivial relation.

It's also not possible to make this easier via an SRF because you don't
know in advance what the result set looks like. So the best I've ever
come up with is a file that can be included in psql that depends on
having set two psql variables to the names of relations that can be
queried (and if you need more than a relation you need to create a temp
view).

I've wondered about the possibility of allowing PLs the ability to
dynamically define their return type based on their arguments. That
would allow for an SRF to handle this case, and would be significantly
more flexible than trying to do that using pseudotypes.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21